nf-core / mhcquant

Identify and quantify MHC eluted peptides from mass spectrometry raw data
https://nf-co.re/mhcquant
MIT License
33 stars 25 forks source link

MapRTTransformer does not change RT according to trafoXML #276

Closed jonasscheid closed 1 year ago

jonasscheid commented 1 year ago

Description of the bug

In the map_alignment step the MapAlignerIdentification computes a trafoXML for each respective run/replicate based on a set of replicate idXMLs. However, the MapRTTransformer inserts values not corresponding to the actual information in the trafoXML. If I only pick one idXML from Comet and do MapAlignerIdentification -> MapRTTransformer I get the following:

Example for a peptide AAAAAAQSVY that is only found once in a run: Before MapAligner: (from idXML)

<PeptideIdentification score_type="q-value" higher_score_better="false" significance_threshold="0.0" MZ="461.735076031900007" RT="2015.799999999999955" spectrum_reference="controllerType=0 controllerNumber=1 scan=6631" >
            <PeptideHit score="3.315649867374006e-03" sequence="AAAAAAQSVY" charge="2" aa_before="A" aa_after="A" start="7" end="16" protein_refs="PH_1213" >

After MapAligner (trafoXML): <Pair from="2015.8" to="2029.05" note="AAAAAAQSVY"/>

After MapRTTransformer:

<PeptideIdentification score_type="q-value" higher_score_better="false" significance_threshold="0.0" MZ="461.735076031900007" RT="2018.374633876312828" spectrum_reference="controllerType=0 controllerNumber=1 scan=6631" >
            <PeptideHit score="3.315649867374006e-03" sequence="AAAAAAQSVY" charge="2" aa_before="A" aa_after="A" start="7" end="16" protein_refs="PH_1213" >

Example with a peptide ATYPYQVVR found 3 times in a replicate: Before MapAligner: (from idXML)

<PeptideIdentification score_type="q-value" higher_score_better="false" significance_threshold="0.0" MZ="548.794127531900017" RT="2471.400000000000091" spectrum_reference="controllerType=0 controllerNumber=1 scan=9033" >
            <PeptideHit score="7.333682556312206e-03" sequence="ATYPYQVVR" charge="2" aa_before="A" aa_after="A" start="240" end="248" protein_refs="PH_217" >
        <PeptideIdentification score_type="q-value" higher_score_better="false" significance_threshold="0.0" MZ="548.793700531899958" RT="2479.400000000000091" spectrum_reference="controllerType=0 controllerNumber=1 scan=9076" >
            <PeptideHit score="4.211208292840946e-03" sequence="ATYPYQVVR" charge="2" aa_before="A" aa_after="A" start="240" end="248" protein_refs="PH_217" >
       <PeptideIdentification score_type="q-value" higher_score_better="false" significance_threshold="0.0" MZ="548.793395031899991" RT="2487.5" spectrum_reference="controllerType=0 controllerNumber=1 scan=9118" >
            <PeptideHit score="5.747126436781609e-03" sequence="ATYPYQVVR" charge="2" aa_before="A" aa_after="A" start="240" end="248" protein_refs="PH_217" >

After MapAligner (trafoXML): <Pair from="2479.4" to="2487.925" note="ATYPYQVVR"/>

After MapRTTransformer:

        <PeptideIdentification score_type="q-value" higher_score_better="false" significance_threshold="0.0" MZ="548.794127531900017" RT="2474.90907977266761" spectrum_reference="controllerType=0 controllerNumber=1 scan=9033" >
            <PeptideHit score="7.333682556312206e-03" sequence="ATYPYQVVR" charge="2" aa_before="A" aa_after="A" start="240" end="248" protein_refs="PH_217" >
        <PeptideIdentification score_type="q-value" higher_score_better="false" significance_threshold="0.0" MZ="548.793700531899958" RT="2482.925487953464199" spectrum_reference="controllerType=0 controllerNumber=1 scan=9076" >
            <PeptideHit score="4.211208292840946e-03" sequence="ATYPYQVVR" charge="2" aa_before="A" aa_after="A" start="240" end="248" protein_refs="PH_217" >
        <PeptideIdentification score_type="q-value" higher_score_better="false" significance_threshold="0.0" MZ="548.793395031899991" RT="2491.042101236520466" spectrum_reference="controllerType=0 controllerNumber=1 scan=9118" >
            <PeptideHit score="5.747126436781609e-03" sequence="ATYPYQVVR" charge="2" aa_before="A" aa_after="A" start="240" end="248" protein_refs="PH_217" >

There seems to be something wrong with mapping the transformed RTs from the trafoXML to idXML. Needs to be investigated further in the respective OpenMS implementation

Edit: Transforming RT of an mzml has the same issue

Command used and terminal output

nextflow run nf-core/mhcquant -r dev -profile test_full,docker --outdir test_quant

Relevant files

test_quant_issue.zip

System information

No response

jonasscheid commented 1 year ago

Unfortunately, the to=xx value in the trafoXML cannot be found in the transformed file. That means there is not even a false mapping of RT values happening ๐Ÿค”.

@timosachsenberg bug in OpenMS?

jpfeuffer commented 1 year ago

Doesn't the Transformer fit a curve to the found points? If that is the case, it is to be expected that the regression does not necessarily go through all points. Do you have the full trafoXML somewhere? Is there maybe a description of the curve/line?

jpfeuffer commented 1 year ago

Yes, I am pretty sure the points are just anchors. The real transformation happens through intercept and slope in the beginning of the file.

jonasscheid commented 1 year ago

Ah that makes sense! Indeed if I use slope and intercept with the from value I end up with the correct transformation. Ok, then everything is as it should be ๐Ÿ™Œ๐Ÿผ Thanks a lot @jpfeuffer ๐Ÿ™๐Ÿผ

A quick follow-up question: Would you recommend b_spline transformation over linear transformation? Doc of MapAlignerIdentification says

This algorithm has been tested mostly with the "b_spline" model

jpfeuffer commented 1 year ago

Not necessarily. A lot of MS programs these days just use a linear transformation AFAIK and it turns out OK. On the other hand, splines wont be a bottleneck in runtime and might yield more overlaps (hopefully without overfitting).

I don't think there is a lot that can go wrong with the algorithm itself so the warning might be overly cautious. Especially if you evaluate the results carefully.

jonasscheid commented 1 year ago

Alright! That is very good to know. Thanks again ๐Ÿ™๐Ÿผ