vdemichev / DiaNN

DIA-NN - a universal automated software suite for DIA proteomics data analysis.
Other
283 stars 53 forks source link

Question about Fragment Intensities #1189

Open SahilCh95 opened 1 month ago

SahilCh95 commented 1 month ago

Hello,

Thank you for developing DIA-NN.

I am new to DIA-NN and mass spec in general. I am currently trying to use DIA-NN to extract fragment level data and have successfully managed to run it to analyze my samples. I am able to extract information relating to protein names, peptide, fragment information (Frag.Info) as well as the fragment intensities (Frag.Raw.Quant) from the report.tsv file. I am currently using R to process this information but I have noticed that sometimes the same fragment, derived from the same peptide (and protein), has maybe 2 intensities assigned to it. I am a little confused as to why that would happen and if that's normal or if there's something wrong with my code. Any help would be appreciated.

-Sahil

vdemichev commented 1 month ago

Hi Sahil,

How does it manifest that the fragment has 'two intensities'?

In general, the recommended way to get fragment intensities for DIA-NN 1.9 is to use the --export-quant option. With it DIA-NN will add fragment intensities (non-normalised), quality scores and annotation to the main report in .parquet format - these are also convenient to use in R, no string parsing required.

Best, Vadim

SahilCh95 commented 1 month ago

Hi Vadim,

I have been using the --report-lib-info flag in the additional options to get fragment intensities and annotations in the report.tsv file.

Also I realize that I generated spectral libraries and analyzed my .raw files in the same pipeline. Could that cause errors like these to occur?

Regards, Sahil

vdemichev commented 1 month ago

Hi Sahil,

Also I realize that I generated spectral libraries and analyzed my .raw files in the same pipeline. Could that cause errors like these to occur?

I am not sure how does the 'error' manifest, i.e what exactly are you seeing?

In general, in silico prediction must not be combined with raw files analysis in the same step, i.e. need to make two pipeline steps, DIA-NN prints a warning about this.

Best, Vadim

SahilCh95 commented 1 month ago

Hi Vadim,

Sounds good, I'll re-run my DIA-NN analysis by first doing generating the spectral libraries and then do the raw analysis in a sperate run.

With regards to what this "error" looks like - after finishing my run and looking at the report.tsv file in R, I can see that for some in some of my samples, the same fragment (for the same peptide and protein) shows up 2 or more times. I've pulled an example from one of my samples and have attached it with this post. (Intensities have been shown right below each fragment) Fragment Example

So I found that some of the peptides (derived from the exact same protein) appear two or more times (for the exact same sample) in the final report, and they have one or more duplicate fragments. In the example shown the y3-H2O^1/328.1973877 fragment appears twice (once in each of the RL4-derived AAAAAAALQAK peptide).

Maybe I'm using the incorrect annotations for fragments?

Thanks, Sahil

vdemichev commented 1 month ago

Hi Sahil,

How does the unprocessed string with fragment information from report.tsv look like for the respective precursor? Please note that it's essential to aggregate by Precursor.Id and not the Stripped.Sequence, as different precursors with the same sequence might have different fragmentation patterns.

appear two or more times (for the exact same sample)

With different modifications and/or charge states?

Best, Vadim

SahilCh95 commented 1 month ago

Hi Vadim,

I feel a little silly. I just went through the report.tsv file again, and see that these "duplicate" fragments are actually derived from differently charged precursor peptides. So they're not really duplicates.

Also regarding your comment about the Stripped Sequence v/s Precursor ID, the reason I was aggregating on the basis of Stripped Sequences was because I'm preparing this data for SAINTq and that's the format in which it accepts this data.

Thanks for your help!

Regards, Sahil