workflow4metabolomics / mtbls-dwnld

4 stars 2 forks source link

More outputs / New tool to access files inside a study #25

Closed gabrielctn closed 6 years ago

gabrielctn commented 6 years ago

Hi, It would be nice and useful, as an enhancement, to have an easy access to raw data files and isa files as an output of this tool, or being able to retrieve those with another tool afterwards.

As part of my workflow project in PhenoMeNal, I need to download several public MTBLS studies, and have access to their raw data and assay (a_) file(s) for example. There are temporary solutions such as using the Risa* R package to read the isa-tab output of this downloader, or untar the raw data file from the other downloader available in phenomenal, but nothing is really available as a tool in a workflow without having to add special code to manage this situation. What do you think ?

Yours

pkrog commented 6 years ago

Hi,

You can access raw files directly in the dataset files directory. The metadata files (s*, a, i_) are easily read if you use Python ISA library. However, you're right, having tools that extract some specific raw files from an ISA dataset would be of great help for users. I'm currently developing a new tool that output a collection of all mzML files contained inside an ISA dataset. Philippe Roccaserra told me similar tools that export nmrML collections and netCDF collections would be useful. Could you please be more specific about your needs and expose me what other outputs you would need?

pkrog commented 6 years ago

See https://github.com/workflow4metabolomics/isa-extractor/issues for the tool that extracts collections from ISA dataset.

gabrielctn commented 6 years ago

Yes I saw that you documented this in the Developer information section of the tool, thank you :)

So below I am refering to the google doc "2nd hangout on Metabolights downloaders": Expose one collection of mzML [“MS/NMR Raw Data”] for each assay: either output a collection from Metabolights Downloader, or make a specific tool for extracting mzML collection

This would be my need. I have to run the IPO tool (soon available in PhenoMeNal) on several Metabolights Studies, and for that I need to have access to the assays and their associated raw data files (mzML, mzData, netCDF, mzXML).

pkrog commented 6 years ago

Ok, I take notes, so I'll add mzData and mzXML to my list.

pkrog commented 6 years ago

@gabrielctn, isa2mzml has just been added to phnmnl/container-galaxy-k8s-runtime (develop branch). Try it.

gabrielctn commented 6 years ago

Very cool thank you ! I have 2 questions however:

Thank you for the fast integration !

pkrog commented 6 years ago

Hi Gabriel,

I'm downloading now study MTBLS266 and will try it on my computer. However I didn't observer this behaviour when testing. Could you test MTBLS291 and tell me how many files are displayed in the list. There are 75 mzML files in this study.

For mzXML files, I will publish a specific tool. In fact there will be one tool for each different collection output. I don't think raising an error for not finding files would be wise. The collection is empty as it as to be, since no files are found. You'll have the possibility, as a user, to check the content of an ISA archive in the next version of Galaxy, and thus to see which types of raw files are included.

pkrog commented 6 years ago

I've uploaded MTBLS266 into Galaxy, and inside the history Galaxy says that the collection of mzML files contains 60 files. Once I click on the collection, Galaxy displays me the full list of files from Person1 to Person30. So no problem for me.

gabrielctn commented 6 years ago

I've been trying to test again with both studies, so the downloader works and I have both studies, but kubernetes does not launch the job isa2mzml, I have the yellow box but no job running behind. I got the error AttributeError: 'NullContainer' object has no attribute 'container_id' in the logs of the pod, I don't know why, I made a clean install ... I'll try again

pkrog commented 6 years ago

Ok. I'll test also again today under minikube, since I will release the other 4 tools.

gabrielctn commented 6 years ago

Ok so I found my problem, I was building my forked galaxy runtime repo and it was not synced to the latest original repo ... sorry for that, I'm not yet too familiar with these git features, but I'm learning :) I have the right amount of files in both datasets now !

pkrog commented 6 years ago

Great! So I can close this issue. Please, next time try to open a new issue into the dedicated GitHub repos, for more clarity.

gabrielctn commented 6 years ago

Yes you're right, I will !