smith-chem-wisc / FlashLFQ

Ultra-fast label-free quantification algorithm for mass-spectrometry proteomics
GNU Lesser General Public License v3.0
21 stars 15 forks source link

problem reading PSMs #160

Open Francescodotta opened 1 month ago

Francescodotta commented 1 month ago

I've tried to run a specific command in python using the command line CMD.dll, i'm in a linux environment. This is the specific code:

flashlfq_path = os.path.join(os.getenv('TOOLS_BASE_PATH'), "FlashLFQ/CMD/bin/Release/net8.0/CMD.dll")
command = [
        "dotnet", flashlfq_path,  # Remove "mono" if you're using Windows EXE
        "--idt", input_pin_file,
        "--rep", os.path.dirname(mzml_file),
        "--out", output_dir,
        "--chg"
    ]

I know for a fact that the directory positions etc. are all correct. I'm trying to run the flashlfq tool after the msfragger and percolator steps and i'm putting as input the percolator output file, and as the directory name, the directory containing the mzml file. However it seems that FlashLFQ can't match the spectra identified through msfragger with the one present in the mzml file. How can i check and correct this error?

This is the error code:

Running FlashLFQ: dotnet /media/datastorage/it_cast/omnis_microservice_db/tools/FlashLFQ/CMD/bin/Release/net8.0/CMD.dll --idt /media/datastorage/it_cast/omnis_microservice_db/test_db/file_mzml/20250228_04_03.pin --rep /media/datastorage/it_cast/omnis_microservice_db/test_db/file_mzml --out /media/datastorage/it_cast/omnis_microservice_db/test_db/flashlfq_results --chg
Opening PSM file /media/datastorage/it_cast/omnis_microservice_db/test_db/file_mzml/20250228_04_03.pin
Problem reading PSMs: The given key '20250228_04_03.388.388' was not present in the dictionary.
FlashLFQ processing complete.
Alexander-Sol commented 1 month ago

If upload the input file from percolator, I'll take a look and try and get this fixed

Francescodotta commented 1 month ago

Do you need the percolator output and the mzml file?

The mzml file is heavy, nearly 380MB

Alexander-Sol commented 1 month ago

The .mzml file would be helpful, if you're able to upload it to box or google drive, but not strictly necessary

I will need the percolator file

Francescodotta commented 1 month ago

Thanks for your help, this is the drive where you can find the mzml file and the output of percolator

https://drive.google.com/drive/u/0/folders/1SuIVb7YZC2YfSN7PVa8-ZEz2XpZMcNwn

Alexander-Sol commented 1 month ago

Sorry, I was unable to access the files at that URL (and the hyperlink redirects me back to github).

Could you share the files with me from Drive? my email is solivais@wisc.edu

Francescodotta commented 1 month ago

Oh sorry, i don't know why.

I've allowed your email to access the drive folder where both the mzml and pin files are stored

Alexander-Sol commented 2 weeks ago

Hello,

After taking a look at the identification file, I've identified the issue. The .pin file is missing several pieces of information that is required for FlashLFQ to run. For each peptide ID, you need the name of the file in which it was identified, the retention time (in minutes) at which it was identified, the unmodified peptide sequence, the modified peptide sequence, the theoretical monoisotopic mass of the peptides, the charge state, and the protein accesion.

If you have a results file that contains that information, you can covert it to our generic input format, the details of which can be found here: https://github.com/smith-chem-wisc/FlashLFQ/wiki/Identification-Input-Formats#generic

Additionally, if you have a identification file that has all the required information, if you share it with me I'll work on getting supported added to FlashLFQ.

Best, Alex

Francescodotta commented 1 week ago

Thanks for your feedback, i'll try to match the general input using python format then i'll share the code so someone in my state can use it, using the MSFragger output and the mzml file as reference.

I'll let you know if i encounter some issues