sneumann / mzR

This is the git repository matching the Bioconductor package mzR: parser for netCDF, mzXML, mzData and mzML files (mass spectrometry data)
42 stars 26 forks source link

"score" function crashing depending on input file #241

Open jmwozniak opened 3 years ago

jmwozniak commented 3 years ago

Hi,

I have two mzid files that I'd like to work with in R. Both are exported from Proteome Discoverer searches. The only difference is that one uses a "Concatenated Target/Decoy Strategy" in the Percolator node while the other uses a "Separate Target/Decoy Strategy". As far as I know, the main difference with these is that the "Concatenated" only passes the best scoring PSM for a particular spectra to Percolator while the "Separate" can pass more than one PSM depending on score difference thresholds (0.05 Delta Cn used in this case).

Everything I've tried in the mzR package (eg. "psm" and "modifications" functions) work as expected for these files EXCEPT the "score" function, which works ONLY for the file from the "Concatenated Target/Decoy Strategy" but not the "Separate Target/Decoy Strategy". Whenever running the "score" function on mzid files from a "Separate Target/Decoy Strategy" search, R will crash or give the following error:

Error in x@backend$getScore() : bad lexical cast: source type value could not be interpreted as target

I have tried this for multiple "Concatenated" vs. "Separate" searches and the "score" function always fails on "Separate" searches.

Any help you can provide on this matter would be greatly appreciated. I can provide more information or the mzid files in question to help troubleshoot.

Thanks!

sneumann commented 3 years ago

Hm, maybe @lgatto can chime in ? Yours, Steffen

lgatto commented 3 years ago

Hi @sneumann - @jmwozniak got in touch initially by email with me. Here's what I had to say:

Indeed, the code in mzR is prone to such errors when the mzid files don't match the format definitions exactly.

The only fix I can think of would be to see if updating the pwiz in mzR code would sort this out. In other words PD doesn't seem to export a valid mzid according to the current definition in pwiz/mzR - would an update help? This isn't a trivial thing to do, so it would really be important to provide some convincing reasons that an update would work.

Could you please file an issue on Github with as many details as possible.

My only explanation is that with the "Separate Target/Decoy Strategy" exported files aren't matching what pwiz is expecting, but I have no idea if pwiz get it wrong/is outdated, or whether Proteome Discoverer exports an invalid mzid file in that case.

@jmwozniak - have you tried loading these files with mzID?

jmwozniak commented 3 years ago

Hi,

Yes, the mzID package seems to work for these files. I should be able to use that to extract the scores I need for my analysis.

Thanks! Jacob