Closed gabrielctn closed 6 years ago
Hi,
You can access raw files directly in the dataset files directory. The metadata files (s*, a, i_) are easily read if you use Python ISA library. However, you're right, having tools that extract some specific raw files from an ISA dataset would be of great help for users. I'm currently developing a new tool that output a collection of all mzML files contained inside an ISA dataset. Philippe Roccaserra told me similar tools that export nmrML collections and netCDF collections would be useful. Could you please be more specific about your needs and expose me what other outputs you would need?
See https://github.com/workflow4metabolomics/isa-extractor/issues for the tool that extracts collections from ISA dataset.
Yes I saw that you documented this in the Developer information section of the tool, thank you :)
So below I am refering to the google doc "2nd hangout on Metabolights downloaders":
Expose one collection of mzML [“MS/NMR Raw Data”] for each assay: either output a collection from Metabolights Downloader, or make a specific tool for extracting mzML collection
This would be my need. I have to run the IPO tool (soon available in PhenoMeNal) on several Metabolights Studies, and for that I need to have access to the assays and their associated raw data files (mzML, mzData, netCDF, mzXML).
Ok, I take notes, so I'll add mzData and mzXML to my list.
@gabrielctn, isa2mzml has just been added to phnmnl/container-galaxy-k8s-runtime (develop branch). Try it.
Very cool thank you ! I have 2 questions however:
The resulting dataset in the galaxy history does not show all files, but if I download the dataset, then all the files are indeed included, so is it just a matter of "limited/random" output presentation in galaxy ?
I tested with MTBLS266 and not all the files are shown. See below the screenshot of the galaxy output (when clicked on the dataset) :
I also tried with a study containing mzXML files, but it runs until the green box appears, except that the collection is then empty. Shouldn't it instead exit with an error (red box) saying why it failed ?
Thank you for the fast integration !
Hi Gabriel,
I'm downloading now study MTBLS266 and will try it on my computer. However I didn't observer this behaviour when testing. Could you test MTBLS291 and tell me how many files are displayed in the list. There are 75 mzML files in this study.
For mzXML files, I will publish a specific tool. In fact there will be one tool for each different collection output. I don't think raising an error for not finding files would be wise. The collection is empty as it as to be, since no files are found. You'll have the possibility, as a user, to check the content of an ISA archive in the next version of Galaxy, and thus to see which types of raw files are included.
I've uploaded MTBLS266 into Galaxy, and inside the history Galaxy says that the collection of mzML files contains 60 files. Once I click on the collection, Galaxy displays me the full list of files from Person1 to Person30. So no problem for me.
I've been trying to test again with both studies, so the downloader works and I have both studies, but kubernetes does not launch the job isa2mzml, I have the yellow box but no job running behind. I got the error AttributeError: 'NullContainer' object has no attribute 'container_id'
in the logs of the pod, I don't know why, I made a clean install ... I'll try again
Ok. I'll test also again today under minikube, since I will release the other 4 tools.
Ok so I found my problem, I was building my forked galaxy runtime repo and it was not synced to the latest original repo ... sorry for that, I'm not yet too familiar with these git features, but I'm learning :) I have the right amount of files in both datasets now !
Great! So I can close this issue. Please, next time try to open a new issue into the dedicated GitHub repos, for more clarity.
Yes you're right, I will !
Hi, It would be nice and useful, as an enhancement, to have an easy access to raw data files and isa files as an output of this tool, or being able to retrieve those with another tool afterwards.
As part of my workflow project in PhenoMeNal, I need to download several public MTBLS studies, and have access to their raw data and assay (a_) file(s) for example. There are temporary solutions such as using the Risa* R package to read the isa-tab output of this downloader, or untar the raw data file from the other downloader available in phenomenal, but nothing is really available as a tool in a workflow without having to add special code to manage this situation. What do you think ?
Yours