vdemichev / DiaNN

DIA-NN - a universal automated software suite for DIA proteomics data analysis.
Other
239 stars 50 forks source link

Different protein group for same peptide different charge state #865

Open Arthfael opened 8 months ago

Arthfael commented 8 months ago

I noted something strange today after I was asked by a colleague to add a number of peptides column to her DiaNN pg matrix. I loaded the report.tsv in R, aggregated protein groups by modified sequence, and was surprised to find that a few PSMs with the same modified sequences but different charge state had been assigned different protein groups. Surely there is no good explanation for that, the observed charge state should not affect the expected assignment to a given protein group as long as modified sequence is the same.

This discrepancy concerned maybe 15 modified sequences in a dataset which contains > 100k, so it looks very minor, but this also raises the question as to whether those assignments which are to a single protein groups are correct. Indeed, I have noted for a while without investigating in details that there are a few discrepancies in the mappings of peptides to proteins accessions, compared with what I would normally expect. Of course it all depends on the protein grouping algorithm and on what one decides should be in those columns (all matches to any protein IDs, versus only matches to discovered proteins in the sense that they are leading proteins from a protein group), but still I think there may be some small issues in there too - though probably only minor ones.

I am happy to share any data to illustrate the point, although the files are rather large.

Arthfael commented 7 months ago

Another thing: I considered yesterday that maybe the discrepancy could be the result of weird library-level shenanigans. For instance, searching data simultaneously with 2 different libraries, each including only observed peptides and based on slightly different fastas, with Re-Annotate off, could I guess maybe result in different charges for a same peptide being in a few cases assigned to different protein groups? However in our case here this was a simple search from a fasta with in-library silico prediction.

vdemichev commented 7 months ago

How does the log look like? Charge state can affect protein group assignment if heuristic protein inference is disabled - this is by design. The interpretation here is simple: precursor to protein group assignment means that the precursor is used to quantify the respective protein group, there's no a priori reason why different charge states cannot be used to quantify different protein groups.

Arthfael commented 7 months ago

Ah, so if I understand correctly, in DiaNN protein group assignment is seen as in the context of which protein group(s) the peptidoform contributes quantitative values for, as opposed to the presence of which protein group(s) is inferred based on the peptidoform's detection?

From the log, the DiaNN call does include the "--relaxed-prot-inf" tag actually. This was an experiment where my colleague wanted to check number of identifications, with no focus on quantitative comparisons.

"There's no a priori reason why different charge states cannot be used to quantify different protein groups." Unless I am missing some mechanism here, I would respectfully argue that different charge states, if they all are correct identifications (as should be verifiable based on chromatographic peak similarities), constitute different measurements on the same peptidoform (independent of which proteins contributed to it). They should in theory correspond to different ionisation statuses of the copies of the same peptides contributing to the same peak, and thus one should expect, ideally, to observe a constant ratio of those states across the elution profile, proportionally to the specification ionisation behaviour of the peptide. Ideally again, different charge states of a same peptidoform should contain the same quantitative information. In reality this is subject to measurement error, signal to noise effects, suppression, etc... But afaik this would be the sole mechanism responsible for the observation of different charge states, and would be wholly independent of the PG inference question. Thus, whilst there is indeed no absolute reason to not use different charge states to quantify different protein groups, I should think that it is still sub-optimal: ideally, LFQ quant methods should either use the most intense charge state (as the less noisy one) or aggregate charge states over modified sequence using a noise-robust method of choice. Whether to use the resulting values to quantify all, or some only, of the protein group which may have contributed to the peptide is a different matter (trade-off between accuracy and number of quantified PGs).

Kind regards,

Armel

On Mon, Nov 20, 2023 at 9:17 AM Vadim Demichev @.***> wrote:

How does the log look like? Charge state can affect protein group assignment if heuristic protein inference is disabled - this is by design. The interpretation here is simple: precursor to protein group assignment means that the precursor is used to quantify the respective protein group, there's no a priori reason why different charge states cannot be used to quantify different protein groups.

— Reply to this email directly, view it on GitHub https://github.com/vdemichev/DiaNN/issues/865#issuecomment-1818431610, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABSHV4OCNFS2UUTV3U5N4FTYFMG2BAVCNFSM6AAAAAA7KN6732VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQMJYGQZTCNRRGA . You are receiving this because you authored the thread.Message ID: @.***>