workflow4metabolomics / mtbls-dwnld

4 stars 2 forks source link

Integration inside W4M instance #47

Open pkrog opened 4 years ago

pkrog commented 4 years ago
pkrog commented 4 years ago

Hi @fgiacomoni , @lecorguille , following @sneumann 's request (Ticket#2019120910000014 MetaboLights import ?), I've created this issue for integration of mtbls-dwnld inside W4M instances.

lecorguille commented 4 years ago

No problemo. Let us know if you need any help or when you want us to install the tool within our dev instance and then main one.

pkrog commented 4 years ago

Hi @khaug, are you still maintainer of the Metabolights database? I'm trying to test my Metabolights downloader Galaxy tool in order to integrate it inside W4M instances. However connections to private studies like MTBLS353 fail because the API token I use is invalid. Would it be possible to get a new one, please?

khaug commented 4 years ago

Hi Pierrick

MESA did not allow us to access this data outside of the active project so the data in MTBLS353 is no longer accessible. Your personal API key is accessible from your accounts details page in MetaboLights.

How are you trying to get to the studies, ascp?

Kind Regards, Ken Haug

On 12 Dec 2019, at 13:30, Pierrick Roger notifications@github.com wrote:

Hi @khaug https://github.com/khaug, are you still maintainer of the Metabolights database? I'm trying to test my Metabolights downloader Galaxy tool in order to integrate it inside W4M instances. However connections to private studies like MTBLS353 fail because the API token I use (15fef9e0-9187-4c8a-857d-93d8e7df53d0) is invalid. Would it be possible to get a new one, please?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/workflow4metabolomics/mtbls-dwnld/issues/47?email_source=notifications&email_token=AAFSBRQXVFA47OYC27J5WFTQYI4H5A5CNFSM4JZMKQ4KYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEGWVIIA#issuecomment-565007392, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAFSBRRMGZK5R7JF4M7DV6TQYI4H5ANCNFSM4JZMKQ4A.

pkrog commented 4 years ago

Ok, would it be possible to have access to another private study for testing purposes? For this test I use wget.

khaug commented 4 years ago

Are you using something like curl/wget and a user_token or are you using Aspera (ascp)?

Kind Regards, Ken Haug

On 12 Dec 2019, at 15:46, Pierrick Roger notifications@github.com wrote:

Ok, would it be possible to have access to another private study for testing purposes?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/workflow4metabolomics/mtbls-dwnld/issues/47?email_source=notifications&email_token=AAFSBRT2KVLF37SEVZ62A5DQYJMEVA5CNFSM4JZMKQ4KYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEGXCZKA#issuecomment-565062824, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAFSBRSJTZSFCLTO2DU2LK3QYJMEVANCNFSM4JZMKQ4A.

pkrog commented 4 years ago

I remember I asked you to copy the MTBLS1 as a private study for testing. I have its private token for ascp, but not for wget. Could you please give me its private key for wget? That would be ok for my test.

khaug commented 4 years ago

Exactly what I was thinking,

The private access (even tough this is public) https://www.ebi.ac.uk/metabolights/reviewer4ZWHUHHlKR

The study obfuscation code for MTBLS1 is 4ZWHUHHlKR

Kind Regards, Ken Haug

On 12 Dec 2019, at 16:10, Pierrick Roger notifications@github.com wrote:

I remember I asked you to copy the MTBLS1 as a private study for testing. I have its private token for ascp, but not for wget. Could you please give me its private key for wget? That would be ok for my test.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/workflow4metabolomics/mtbls-dwnld/issues/47?email_source=notifications&email_token=AAFSBRXJ7BWY6FEBHVITERLQYJO6JA5CNFSM4JZMKQ4KYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEGXFJXI#issuecomment-565073117, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAFSBRWPEYFLLHTLFU6M7YDQYJO6JANCNFSM4JZMKQ4A.

pkrog commented 4 years ago

In fact the downloading of the private version of MTBLS1 fails with ascp too:

ascp -q --policy=fair -T -l 1g -N '?_*.t*' -E '*.*' mtblight@ah01.ebi.ac.uk:/prod/mtbls1-******** .
ascp: Failed to open TCP connection for SSH, exiting.
ERROR: Downloading of study /prod/mtbls1-********* has failed.

I've masked the token with stars.

sneumann commented 4 years ago

Hi @pkrog do you have the correct host ? I got a connection with: ascp -QT -P 33001 -L- -l 300M . mtblight@hx-fasp-1.ebi.ac.uk:/prod/mtbls1-***** (but didn't download the entire study on board a train with 600 ppl sharing the internet in rural Germany :-)

pkrog commented 4 years ago

Hi @khaug ,

I still have the same error with:

ascp -q --policy=fair -T -l 1g -N '?_*.t*' -E '*.*' mtblight@ah01.ebi.ac.uk:/prod/mtbls1-4ZWHUHHlKR .

The address hx-fasp-1.ebi.ac.uk as suggested by Steffen gives the same error too. My old key was gs4qYabh for MTBLS1. Am I doing something else wrong?

khaug commented 4 years ago

Hi Pierric

Sorry, I was on annual leave back end of last week.

The server you are using was retired about 1.5 years ago, please use "hx-fasp-1.ebi.ac.uk” instead. Also add port parameter “-P 33001”. I was under the impression that all PhenoMeNal references were changed, but that is obviously not correct.

Kind Regards, Ken Haug

On 16 Dec 2019, at 13:26, Pierrick Roger notifications@github.com wrote:

Hi @khaug https://github.com/khaug ,

I still have the same error with:

ascp -q --policy=fair -T -l 1g -N '?_.t' -E '.' mtblight@ah01.ebi.ac.uk:/prod/mtbls1-4ZWHUHHlKR . The address hx-fasp-1.ebi.ac.uk as suggested by Steffen gives the same error too. My old key was gs4qYabh for MTBLS1. Am I doing something else wrong?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/workflow4metabolomics/mtbls-dwnld/issues/47?email_source=notifications&email_token=AAFSBRR5HZX3TZ7QYNQL24DQY56YPA5CNFSM4JZMKQ4KYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEG6WHLI#issuecomment-566059949, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAFSBRSRI4NMS5GV2XO677LQY56YPANCNFSM4JZMKQ4A.

pkrog commented 4 years ago

Ok, so I've tried:

ascp -q --policy=fair -T -l 1g -N '?_*.t*' -E '*.*' -P 33001 mtblight@hx-fasp-1.ebi.ac.uk:/prod/mtbls1-4ZWHUHHlKR .

And now it asks me for a password.

khaug commented 4 years ago

Yes.

We have always had to use a password for Aspera, but I gather from your email this is not what you expect? The public data Aspera server has one password and the private data Aspera server another.

Pierrick. The password is publicly available, but please ping me on Slack if you can, that may be a lot faster.

Kind Regards, Ken Haug

On 16 Dec 2019, at 13:42, Pierrick Roger notifications@github.com wrote:

Ok, so I've tried:

ascp -q --policy=fair -T -l 1g -N '?_.t' -E '.' -P 33001 mtblight@hx-fasp-1.ebi.ac.uk:/prod/mtbls1-4ZWHUHHlKR . And now it asks me for a password.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/workflow4metabolomics/mtbls-dwnld/issues/47?email_source=notifications&email_token=AAFSBRUEXH2UBIZWJT5LLODQY6ATNA5CNFSM4JZMKQ4KYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEG6XYPQ#issuecomment-566066238, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAFSBRSCFTEQXKGQIAIY7STQY6ATNANCNFSM4JZMKQ4A.

pkrog commented 4 years ago

No problemo. Let us know if you need any help or when you want us to install the tool within our dev instance and then main one.

Hi @lecorguille , I've published the tool to the test toolshed: https://testtoolshed.g2.bx.psu.edu/view/prog/mtblsdwnld/d9aa2a6dc8f0. The tool offers two modes for downloading from Metabolights:

  1. With wget.
  2. With IBM Aspera. I suppose wget will be available on the server. For Aspera, just follow the link inside the README for installation. Also unzip must be available, but I guess it's not an issue. Thanks for your help, and let me know when it's ready.
lecorguille commented 4 years ago

Work In Progress but I have some issue to install tools from the TestToolShed within Galaxy. https://gitter.im/galaxyproject/Lobby?at=5dfca65ad2dadb38934b1958

lecorguille commented 4 years ago

@pkrog Can you push the tool on the Main Toolshed, I will install it on the prod instance. I did found the time to look at that issue with mercurial.

pkrog commented 4 years ago

Done. See https://toolshed.g2.bx.psu.edu/view/prog/mtblsdwnld/8dab200e02cb.

lecorguille commented 4 years ago

Installed on https://galaxy.workflow4metabolomics.org/ in the section Data Handling

sneumann commented 4 years ago

Very cool, thanks! I tried with MTBLS10 both ISA-Tab metadata only and Full study including raw data and ISA-Tab metadata and it worked.

Then I tried MTBLS36 full study and got an error. A possibility might be a disk-full issue ?

[/work/project/w4m/galaxy4metabolomics/database/jobs_directory/001/000/1000802/dataset_2028784_files/MTBLS36.zip]
  End-of-central-directory signature not found.  Either this file is not
  a zipfile, or it constitutes one disk of a multi-part archive.  In the
  latter case the central directory and zipfile comment will be found on
  the last disk(s) of this archive.
unzip:  cannot find zipfile directory in one of /work/project/w4m/galaxy4metabolomics/database/jobs_directory/001/000/1000802/dataset_2028784_files/MTBLS36.zip or
        /work/project/w4m/galaxy4metabolomics/database/jobs_directory/001/000/1000802/dataset_2028784_files/MTBLS36.zip.zip, and cannot find /work/project/w4m/galaxy4metabolomics/database/jobs_directory/001/000/1000802/dataset_2028784_files/MTBLS36.zip.ZIP, period.
ERROR: Unable to unzip archive /work/project/w4m/galaxy4metabolomics/database/jobs_directory/001/000/1000802/dataset_2028784_files/MTBLS36.zip.

Note: Aspera download does not work. This could be entirely disabled on W4M, or install this dependency https://anaconda.org/hcc/aspera-cli. More discussion in https://github.com/unlhcc/hcc-conda-recipes/issues/8

which: no ascp in (/work/project/w4m/galaxy4metabolomics/shed_tools_deps/_conda/envs/__isatools@0.10.3/bin:/work/project/w4m/galaxy4metabolomics/shed_tools_deps/_conda/condabin:/work/project/w4m/galaxy4metabolomics/galaxy-dist/.venv/lib/node_modules/.bin

Yours, Steffen

lecorguille commented 4 years ago

@fgiacomoni Can you check our quota ?

fgiacomoni commented 4 years ago

We are large :-/ but we are opening data valves with MTBLS downloader Do you check quota conf for download flux ?

pkrog commented 4 years ago

For Aspera, installation is explained inside the README file of the tool. @sneumann , do you need also isa-extractor tool or isa2w4m tool?

lecorguille commented 4 years ago

I didn't know that some dependencies have to be installed :/ @pkrog, you should add the different requirements in the wrapper as zip, python 3, aspera. It's a pity that Aspera isn't available within bioconda 👎

pkrog commented 4 years ago

I'm not sure about that @lecorguille , aspera executables are binary files, and I think that the installer script does some dynamic link update when installing. Thus it seems safer to install from the latest version of the script installer.

sneumann commented 4 years ago

Hi @pkrog, yes, the next tools would be to have https://github.com/phnmnl/container-isa-extractor to get at the actual *.mzML files for real processing inside W4M, plus https://github.com/phnmnl/container-isa2w4m so we get the variable / sample metadata out of the ISA. Thanks, yours, Steffen

pkrog commented 4 years ago

@lecorguille , I've just checked, the installer script does not do anything on the binary files, so it would be possible to embed them inside the tool. I'll add that to my todo list.

lecorguille commented 4 years ago

@pkrog When I said "add the different requirements in the wrapper", it means at them as requirements that will be installed by Conda. If we push harder, it might be available in Bioconda. It's annoying to add a channel in Galaxy just for one tool.

sneumann commented 4 years ago

Hi, I am not pushing for aspera, things basically work with wget. But keeping aspera in mind and bringing it up in Bioconda conversations would be great. Yours, Steffen

pkrog commented 4 years ago

Ok, got it.

lecorguille commented 4 years ago

@pkrog Can you imagine pushing this tools in https://github.com/workflow4metabolomics/tools-metabolomics ?