yeastrc / limelight-import-crux-comet-percolator

Convert the output of the Crux pipeline to Limelight XML suitable for import into Limelight.
Apache License 2.0
1 stars 1 forks source link

Error when checking that percolator results have comet results. #1

Open ajmaurais opened 1 year ago

ajmaurais commented 1 year ago

I am doing a comet search with a differential modification on cysteine with a mass of 513.30635.

I am getting an error when I try convert the crux output to a limelight XML:

Crux Comet/Percolator to limelight XML converter
Author: Michael Riffle <mriffle@uw.edu>
See: https://github.com/yeastrc/limelight-import-crux-comet-percolator
Version: 3.2.0

Finding pepXML files... Found 3 file(s).
Finding percolator output file... Done
Determining versions for pipeline...
        Crux version: 4.1-fa9efc63-2021-11-19
        Comet version: 2021.01 rev. 0
        Percolator version: 3.05.nightly-137-e806a0c5, Build Date Nov 19 2021 19:15:27
Reading comet params... Done.
Reading Percolator XML data into memory... Got 57120 peptides.  Done.

Process pepXML file: comet.RUN3_04.target.pep.xml
        Reading Comet pepXML data into memory... Done.
        Verifying all percolator results have comet results...Encountered error during conversion: Error: Comet results not found for peptide: TC[513.3063]RGLFVLC[57.0215]QYCGLLQIYSADTPSSSYTQSTMDHDLHD
java.lang.Exception: Error: Comet results not found for peptide: TC[513.3063]RGLFVLC[57.0215]QYCGLLQIYSADTPSSSYTQSTMDHDLHD
        at org.yeastrc.limelight.xml.crux_comet_percolator.reader.CometPercolatorValidator.validateData(CometPercolatorValidator.java:40)
        at org.yeastrc.limelight.xml.crux_comet_percolator.main.ConverterRunner.convertCruxCometPercolatorToLimelightXML(ConverterRunner.java:89)
        at org.yeastrc.limelight.xml.crux_comet_percolator.main.MainProgram.run(MainProgram.java:91)
        at picocli.CommandLine.execute(CommandLine.java:1160)
        at picocli.CommandLine.access$800(CommandLine.java:141)
        at picocli.CommandLine$RunLast.handle(CommandLine.java:1367)
        at picocli.CommandLine$RunLast.handle(CommandLine.java:1335)
        at picocli.CommandLine$AbstractParseResultHandler.handleParseResult(CommandLine.java:1243)
        at picocli.CommandLine.parseWithHandlers(CommandLine.java:1526)
        at picocli.CommandLine.run(CommandLine.java:1974)
        at picocli.CommandLine.run(CommandLine.java:1904)
        at org.yeastrc.limelight.xml.crux_comet_percolator.main.MainProgram.main(MainProgram.java:110)

I believe what is causing the error is that the modification masses in peptide strings in the percolator output file are rounded to 4 decimal places with digits preceding a 5 rounded down. When the limelight converter reads the pep.xml files it has to calculate how the modifications should be encoded in a string to cross reference the percolator and pep.xml results. The limelight converter rounds digits preceding a 5 up. So the same peptide in the percolator results would be encoded as C[513.3063] whereas the peptides in the .pep.xml results would be encoded as C[513.3064].

Ultimately I was able to get the conversion to work by modifying the RoundingMode in the getReportedPeptideStringForSequenceAndMods function to HALF_DOWN

https://github.com/yeastrc/limelight-import-crux-comet-percolator/blob/74d7e154090182cf5d762499a29b93bd7d71926c/src/main/java/org/yeastrc/limelight/xml/crux_comet_percolator/utils/ReportedPeptideUtils.java#L41

But I don't know if it is always safe to assume that the percolator results will always be rounded down.

mriffle commented 1 year ago

Hi @ajmaurais, thanks for the bug report. I asked Jimmy about this and it looks like comet uses a rounding mode of HALF_EVEN, which always rounds to the nearest even number. I made the change and created a new release 3.3.1 that actually includes some other optimizations as well. (https://github.com/yeastrc/limelight-import-crux-comet-percolator/releases)

If you have a moment, can you test that on your data?

ajmaurais commented 1 year ago

I am still getting an error:

Crux Comet/Percolator to limelight XML converter
Author: Michael Riffle <mriffle@uw.edu>
See: https://github.com/yeastrc/limelight-import-crux-comet-percolator
Version: 3.3.1

Finding pepXML files... Found 1 file(s).
Finding percolator output file... Done
Parsing percolator log file... Done
Determining versions for pipeline...
        Crux version: 4.1-fa9efc63-2021-11-19
        Comet version: 2021.01 rev. 0
        Percolator version: 3.05.nightly-137-e806a0c5, Build Date Nov 19 2021 19:15:27
Reading comet params... Done.
Reading Percolator XML data into memory... Got 6364 peptides.  Done.
Determining # of decimal places in mods in percolator peptide strings...Got: 0

Process pepXML file: comet.target.pep.xml
        Reading Comet pepXML data into memory... Done.
        Verifying all percolator results have comet results...Encountered error during conversion: Error: Comet results not found for peptide: MGC[57.0215]CGC[513.3063]GGCGGRC[513.3063]SGGCGGGCGGGCGG
java.lang.Exception: Error: Comet results not found for peptide: MGC[57.0215]CGC[513.3063]GGCGGRC[513.3063]SGGCGGGCGGGCGG
        at org.yeastrc.limelight.xml.crux_comet_percolator.reader.CometPercolatorValidator.validateData(CometPercolatorValidator.java:40)
        at org.yeastrc.limelight.xml.crux_comet_percolator.main.ConverterRunner.convertCruxCometPercolatorToLimelightXML(ConverterRunner.java:98)
        at org.yeastrc.limelight.xml.crux_comet_percolator.main.MainProgram.run(MainProgram.java:91)
        at picocli.CommandLine.execute(CommandLine.java:1160)
        at picocli.CommandLine.access$800(CommandLine.java:141)
        at picocli.CommandLine$RunLast.handle(CommandLine.java:1367)
        at picocli.CommandLine$RunLast.handle(CommandLine.java:1335)
        at picocli.CommandLine$AbstractParseResultHandler.handleParseResult(CommandLine.java:1243)
        at picocli.CommandLine.parseWithHandlers(CommandLine.java:1526)
        at picocli.CommandLine.run(CommandLine.java:1974)
        at picocli.CommandLine.run(CommandLine.java:1904)
        at org.yeastrc.limelight.xml.crux_comet_percolator.main.MainProgram.main(MainProgram.java:110)

This regex:

https://github.com/yeastrc/limelight-import-crux-comet-percolator/blob/42eaecef9c9f0c699e1868fef3b2c7650ccef7eb/src/main/java/org/yeastrc/limelight/xml/crux_comet_percolator/utils/PercolatorParsingUtils.java#L115

is never matching anything so it thinks there are 0 decimal places in the modification string. Due to the position anchors, the regex won't match peptides with multiple modifications. Even if I remove the position anchors it still doesn't match anything and I am not sure why.

I also tried just changing the default return value of getNumberOfDecimalPlacesInPercolatorMod to 4, but then I get the same error as before due to the results in the pout.xml file being C[513.3063] and the modifications that the converter is generating being C[513.3064].

For the differential modification of 513.30635, all the masses in the pout.xml file are rounded as C[513.3063]. That wouldn't be the expected behavior of the ROUND_EVEN rounding mode correct? If Commet is using ROUND_EVEN shouldn't they be rounded as C[513.3064]?

mriffle commented 1 year ago

Hmm. OK, I'll generate some test data on my end looking for that mod mass and see if I can duplicate. Thanks again.

ajmaurais commented 1 year ago

Ok, I can also share my data with you if that helps.

mriffle commented 1 year ago

That would be great, can you stick it in a google drive? I can't access GSIT systems.