shuzhao-li-lab / PythonCentricPipelineForMetabolomics

Python pipeline for metabolomics data preprocessing, QC, standardization and annotation
Other
9 stars 0 forks source link

asari cannot locate mzml files from assebled experiment #73

Closed YasinEl closed 1 month ago

YasinEl commented 4 months ago

Describe the bug After successfully running pcpfm assemble an generating a new pcpfm_experiment folder pcpfm asari fails to locate the corresponding mzml files.

My metadata.csv file is


Sample Type,Name,Filepath
Unknown,002_Blk_Water_POS,/home/yasin/projects/ISFs/asari/raw_data/002_Blk_Water_POS.mzML
Unknown,003_Blk_Water_POS,/home/yasin/projects/ISFs/asari/raw_data/003_Blk_Water_POS.mzML

And the command I ran to assemble the experiment was:

pcpfm assemble -s ./metadata.csv --name_field='Name' --path_field='Filepath' -o . -j pcpfm_experiment
Attempting:  assemble
Succesfully executed:  assemble

after that I get

 pcpfm asari -i ./pcpfm_experiment/
Attempting:  asari

~~~~~~~ Hello from Asari (1.13.1) ~~~~~~~~~

Working on ~~ /home/yasin/projects/ISFs/asari/pcpfm_experiment/converted_acquisitions/ ~~ 

No valid mzML files are found in the input directory :(
Succesfully executed:  asari

After moving the mzml files to /home/yasin/projects/ISFs/asari/pcpfm_experiment/converted_acquisitions/ the command runs successfully

I am on Ubuntu 20.04.6 LTS.

jmmitc06 commented 4 months ago

Thanks for submitting this. I'm not 100% sure what is the issue here as I usually do my testing on OSX and LInux as well and have not encountered this situation.

By default, when the input files are mzML the files are symbolically linked from their original location to the converted_acquistions directory. I know that this linking will fail when the source and target directories cross a file system boundary, although I would have thought you should get an error if you did that. Is there any chance that the mzML files and the analysis directory are on different filesystems or mountpoints?

YasinEl commented 4 months ago

Thank you for the response. No, it was not on different file systems. The files were originally in "/home/yasin/projects/ISFs/asari/raw_data" as specified in the metadata.csv. The experiment I created via assembleis in /home/yasin/projects/ISFs/asari/pcpfm_experiment/

Maybe the problem is that I started from mzml files directly rather than using the conversion implemented in pcpfm?

jmmitc06 commented 4 months ago

So I often start my analyses from mzML files so I don't think it is that. Let me make an Ubuntu VM and see if I can replicate the problem. Any chance you could share those two files in the example with me? I doubt the files are related but it would be one less variable to rule out.

Jul 31, 2024 5:50:49 PM YasinEl @.***>:

Thank you for the response. No, it was not on different file systems. The files were originally in "/home/yasin/projects/ISFs/asari/raw_data" as specified in the metadata.csv. The experiment I created via assemble is in /home/yasin/projects/ISFs/asari/pcpfm_experiment/

Maybe the problem is that I started from mzml files directly rather than using the conversion implemented in pcpfm?

— Reply to this email directly, view it on GitHub[https://github.com/shuzhao-li-lab/PythonCentricPipelineForMetabolomics/issues/73#issuecomment-2261520383], or unsubscribe[https://github.com/notifications/unsubscribe-auth/ACNJZYIA7XTCR4G5CO3DRBTZPFL3TAVCNFSM6AAAAABLW5LH4OVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDENRRGUZDAMZYGM]. You are receiving this because you commented. [Tracking image][https://github.com/notifications/beacon/ACNJZYOVIZ3L6RZ5IDXCASLZPFL3TA5CNFSM6AAAAABLW5LH4OWGG33NNVSW45C7OR4XAZNMJFZXG5LFINXW23LFNZ2KUY3PNVWWK3TUL5UWJTUGZQH76.gif]

YasinEl commented 4 months ago

Sure. You can download them from ftp://massive.ucsd.edu/v06/MSV000093526/peak/mzML/

Best, Yasin

jmmitc06 commented 3 months ago

Hi Yasin,

I figured out the issue. The workflow in our lab has MS1-only acquisitions and DDA type MS2 acquisitions for annotation. With that in mind, when I designed assemble I scan the mzML files to look for acquisitions with MS2 data and put them in a separate MS2 acquisitions directory for later use during annotation.

Your data appears to have MS1 and MS2 in the same acquisition, which triggers the above check and results in the files being placed in the MS2 data directory which Asari does not look for.

This is a known limitation / bad design on my part and actually was discussed in today's lab meeting.

I'm not 100% sure what the long-term fix is, but I can add a flag to disable that behavior for the time being and add some more informative output until a long-term fix is made.

YasinEl commented 3 months ago

That's great thank you.

jmmitc06 commented 1 month ago

I assume that this resolved your issue. I'm going to close this issue. Please feel free to contact me or open/re-open the issue if needed.