Open cutleraging opened 1 month ago
Hi Ronnie,
DIA-NN calculates FDR roughly as number of decoy hits divided by the number of target hits. Since you have just 70 target precursor searched, the estimated FDR normally will not go below 1/70, i.e. will never reach 1%. So not going to work like this.
Suggestion:
Best, Vadim
Hi Vadim,
I tried to implement your advice to improve the searches. My samples are whole cell lysate so there is a good amount of background. Here is what I did...
Created a spectral library of the background proteome by searching the samples in fragpipe
Combined this with my spectral library of histone peptidoforms
Searched this with the following settings
Which resulted in this (removed some rows to fit 25MB file size) report-subset.csv
With this, I only was able to detect 11/69 of the histone peptidoforms. Any ideas why this could be? Ideally I want to get it so that DIANN can quantify >90% of the histone peptidoforms which I have already found manually in skyline.
Some additional questions
Thanks so much, Ronnie
Hi Ronnie,
Changed the retention time values to be on a scale of 0-100 (to match the background proteome spectral library)
If this was done incorrectly, it will severly reduce IDs of modified histone peptides.
Added proteotypic information for histone peptidoforms - assigned as 1 - is this correct to do?
Is there any purpose in using the peptidoform ID as protein ID instead of the real protein ID? It makes sense that it causes those warnings you mentioned printed by DIA-NN.
I suspect that some of the histone peptidoforms are not being found because of how the retention time window is being set. Is it possible to manually set this, say to something like 5 min? Is this the --window parameter?
Indeed, that would make sense. Try starting with --im-window 0.1 and --rt-window set to a 5th of the gradient length. and then see if increasing either further helps. Note that these settings will only be necessary for the first-pass of MBR, so better do the first pass separately and then just use the refined lib with normal settings. But most importantly, please try to align the library RTs and IMs of histone peptides with the rest of the peptides with something like loess, based on the search with wide IM and RT windows, for example.
I also include negative control (empty wells) in the search. Will this hurt performance in any way?
It might hurt 'a tiny bit' quantification and normalisation.
Best, Vadim
Hi Vadim and team,
I have generated a spectral library in skyline which I have exported as a report to be used in DIAnn. The reason for doing this is that I am analyzing histone PTMs and am interested in certain histone peptidoforms. Essentially this is an effort to reduce the search space. Therefore, I have set the proteins to be the peptidoforms, as I am interested in quantifying those. I have attached the report generated from skyline which I use as the input for DIAnn as well as the log which contains the command used to run DIAnn and the results. When identifying histone peptidoforms, it is quite important that the retention times be taken into account as well as the ion mobility (especially for isobaric forms). I have some questions that have come up when doing this analysis that I would appreciate your help on.
skyline spectral library.csv
report.log.txt
report.csv
1) When I search using this custom spectral library, I want to make sure that the retention times are being taken into account when searching for the histone peptidoforms. However, in the results, I can see that the identifications are not at the expected retention time (comparing to retention times in spectral library). I see that in the spectral library that DiaNN built, there are values in the Tr_recalibrated column. How can I ensure that DiaNN takes into account the RT in the spectral library? Is there a setting for an RT window or something like this?
2) Even though I include IonMobility column in the spectral library, in the spectral library that DiaNN built, all the values are 0. Why is this?
3) The LibraryIntensity that I provided in the spectral library was the MS/MS peak intensity of the corresponding product. I see these are all values between 0-1. Is this normal?
4) As I mentioned, I am interested in quantifying at the modified peptide level. In my spectral library, I provide the ModifiedPeptide and then in the ProteinName column, I provide a corresponding name for the modified peptide (e.g. H1.5-K33[un];K45[un];K51[un]). However, in the results, the Protein.Names column only contains a single name. How can I have it so that the Protein.Names column in the results corresponds to the ProteinName column in the spectral library? I want to do this so that I can use the values in the PG.Quantity column.
5) I have been varying the qvalue of my searches with the spectral library to see how this effects the results. (e.g. 50%). I find that when relaxing it, I get many more of the IDs that I expect, although at a cost. I notice that some of these new IDs are not at the expected retention time. Is there a way for me to determine the optimal qvalue for my situation? Also, what's the difference between precursor-level q-value and run-specific protein q-value?
6) Some of the histone peptidoforms are isobaric, and don't contain a unique fragment (although they do differ if looking at combinations of fragments). Should "No shared spectra" be enabled?
7) My samples are both label free single cells and bulk. Should the bulk samples be included in the searches with MBR enabled to boost the IDs in the single cells?
Thanks a lot! Ronnie