Custom spectral library from skyline

Hi Vadim and team,

I have generated a spectral library in skyline which I have exported as a report to be used in DIAnn. The reason for doing this is that I am analyzing histone PTMs and am interested in certain histone peptidoforms. Essentially this is an effort to reduce the search space. Therefore, I have set the proteins to be the peptidoforms, as I am interested in quantifying those. I have attached the report generated from skyline which I use as the input for DIAnn as well as the log which contains the command used to run DIAnn and the results. When identifying histone peptidoforms, it is quite important that the retention times be taken into account as well as the ion mobility (especially for isobaric forms). I have some questions that have come up when doing this analysis that I would appreciate your help on.

skyline spectral library.csv

report.log.txt

report.csv

1) When I search using this custom spectral library, I want to make sure that the retention times are being taken into account when searching for the histone peptidoforms. However, in the results, I can see that the identifications are not at the expected retention time (comparing to retention times in spectral library). I see that in the spectral library that DiaNN built, there are values in the Tr_recalibrated column. How can I ensure that DiaNN takes into account the RT in the spectral library? Is there a setting for an RT window or something like this?

2) Even though I include IonMobility column in the spectral library, in the spectral library that DiaNN built, all the values are 0. Why is this?

3) The LibraryIntensity that I provided in the spectral library was the MS/MS peak intensity of the corresponding product. I see these are all values between 0-1. Is this normal?

4) As I mentioned, I am interested in quantifying at the modified peptide level. In my spectral library, I provide the ModifiedPeptide and then in the ProteinName column, I provide a corresponding name for the modified peptide (e.g. H1.5-K33[un];K45[un];K51[un]). However, in the results, the Protein.Names column only contains a single name. How can I have it so that the Protein.Names column in the results corresponds to the ProteinName column in the spectral library? I want to do this so that I can use the values in the PG.Quantity column.

5) I have been varying the qvalue of my searches with the spectral library to see how this effects the results. (e.g. 50%). I find that when relaxing it, I get many more of the IDs that I expect, although at a cost. I notice that some of these new IDs are not at the expected retention time. Is there a way for me to determine the optimal qvalue for my situation? Also, what's the difference between precursor-level q-value and run-specific protein q-value?

6) Some of the histone peptidoforms are isobaric, and don't contain a unique fragment (although they do differ if looking at combinations of fragments). Should "No shared spectra" be enabled?

7) My samples are both label free single cells and bulk. Should the bulk samples be included in the searches with MBR enabled to boost the IDs in the single cells?

Thanks a lot! Ronnie

Hi Ronnie,

DIA-NN calculates FDR roughly as number of decoy hits divided by the number of target hits. Since you have just 70 target precursor searched, the estimated FDR normally will not go below 1/70, i.e. will never reach 1%. So not going to work like this.

Suggestion:

Include everything you can and that is likely detectable in the spectral library - every sample has a background, at the very least common contaminants. Indeed makes sense only for histones to be modified, everything else can be unmodified. The only important thing here is that RT and IM scales are the same for histones and other peptides.
Enable peptidoform scoring
Keep q-value threshold <= 0.05

I can see that the identifications are not at the expected retention time (comparing to retention times in spectral library). DIA-NN will always align automatically the library RTs to the RT in the specific DIA experiment. Scales do not need to be matched, i.e. you can use a library based on 120-min gradient to analyse data acquired with 3-min gradient.
All calibrations just fail because too few peptides detected, i.e. need to search background proteome too.
Can be any scale here, does not matter for DIA-NN.
Please add a Protein.Ids column to the spectral library and the info there will correspond to the Protein.Ids column in the DIA-NN report.
Any value 0.01 - 0.05 is fine for an MBR search, provided the final output is filtered at Q.Value <= 0.01 and, in your case, Peptidoform.Q.Value <= 0.01 and potentially PEP <= 0.01. Protein q-values indicate the proportion of falsely identified proteins (not precursors), i.e. filtering just based on a precursor q-value will result in higher FDR for proteins then the filter threshold used, hence the need for a specific protein q-value.
No shared spectra should always be enabled. If there are no specific fragments, then how do you (or Skyline) distinguish between them for the inclusion in the library?
Yes, definitely.

Best, Vadim

Hi Vadim,

I tried to implement your advice to improve the searches. My samples are whole cell lysate so there is a good amount of background. Here is what I did...

Created a spectral library of the background proteome by searching the samples in fragpipe
- Removed entries corresponding to histone peptides library.csv
Combined this with my spectral library of histone peptidoforms
- Changed the retention time values to be on a scale of 0-100 (to match the background proteome spectral library).
- Changed the peptidoform names to ProteinID
- Added proteotypic information for histone peptidoforms - assigned as 1 - is this correct to do? histone + background proteome - spectral library.csv
Searched this with the following settings
- Now including 2 additional modifications that were found in background proteome
- FDR=0.01
- Is the library header correct here? report.log.txt
Which resulted in this (removed some rows to fit 25MB file size) report-subset.csv

With this, I only was able to detect 11/69 of the histone peptidoforms. Any ideas why this could be? Ideally I want to get it so that DIANN can quantify >90% of the histone peptidoforms which I have already found manually in skyline.

Some additional questions

I suspect that some of the histone peptidoforms are not being found because of how the retention time window is being set. Is it possible to manually set this, say to something like 5 min? Is this the --window parameter?
Are there other parameters you would recommend playing with when analyzing single-cell data. Such as: --tims-min-int, --min-peak, --min-corr?
When Ion Mobility is included in the spectral library, this means that the intensities will just be extracted within a given RT and IM window. In other words, this is filtering out any interference that might be increasing our intensities due to having the same RT but different CCS. Is this correct? What happens if the IM is not included in the spectral library?
I also include negative control (empty wells) in the search. Will this hurt performance in any way?
Help with this warning? WARNING: 11 cases of precursors with the same sequence matched to different sets of proteins, check if this is intended : 1
Help with this warning? WARNING: 13 precursors were wrongly annotated in the library as proteotypic : 35

Thanks so much, Ronnie

Hi Ronnie,

Changed the retention time values to be on a scale of 0-100 (to match the background proteome spectral library)

If this was done incorrectly, it will severly reduce IDs of modified histone peptides.

Added proteotypic information for histone peptidoforms - assigned as 1 - is this correct to do?

Is there any purpose in using the peptidoform ID as protein ID instead of the real protein ID? It makes sense that it causes those warnings you mentioned printed by DIA-NN.

I suspect that some of the histone peptidoforms are not being found because of how the retention time window is being set. Is it possible to manually set this, say to something like 5 min? Is this the --window parameter?

Indeed, that would make sense. Try starting with --im-window 0.1 and --rt-window set to a 5th of the gradient length. and then see if increasing either further helps. Note that these settings will only be necessary for the first-pass of MBR, so better do the first pass separately and then just use the refined lib with normal settings. But most importantly, please try to align the library RTs and IMs of histone peptides with the rest of the peptides with something like loess, based on the search with wide IM and RT windows, for example.

I also include negative control (empty wells) in the search. Will this hurt performance in any way?

It might hurt 'a tiny bit' quantification and normalisation.

Best, Vadim

vdemichev / DiaNN

Custom spectral library from skyline #1214