vdemichev / DiaNN

DIA-NN - a universal automated software suite for DIA proteomics data analysis.
Other
278 stars 54 forks source link

SILAC implementation & quantification issues #28

Closed JWLackmann closed 4 years ago

JWLackmann commented 4 years ago

Dear Vadim,

we are currently in the process of setting up new DIA-based workflows currently using 1.7.10. One aim is to also use it for SILAC experiments. I went trough the manual and as far as I understand, recognition of SILAC samples should be automated? However, I get no proper results using either the --mod command w/ UniMod IDs nor using no command line arguments at all. In both cases, DIA-NN identifies a number of proteins similar to our alternatives from the samples, but no SILAC ratios are present in any of the generated *.tsv (and the Label.Ratio column always reads 0). I am not sure at which point the modifications have to be added, either on library creation, on sample calculation or at both (or none). I tested all three variants and always gain the same outcome. We use standard SILAC mass shifts (R10, K8), so nothing fancy there and I will happily provide any additional info I forgot to mention.

Thanks a lot and keep up your great work, really enjoying DIA-NN, Jan

vdemichev commented 4 years ago

Hi Jan,

Yes, SILAC should work (DIA/SWATH, not DDA), however the support is currently experimental. This means that it has only been tested (not extensively) on plasma samples spiked in with PQ500, so a very specific case.

I would recommend using spectral library-based search for SILAC. The easiest way to generate a SILAC library is probably by manually editing the unlabelled spectral library: DIA-NN can save any spectral library as a simple table (.tsv), and that can be edited in R/Python. All that is required to generate a labelled library is add the SILAC delta masses to all y-series fragments, depending on whether the peptide has R or K at the C-terminus, as well as annotate precursors with (UniMod:259) and (UniMod:267) SILAC labels. Then what remains is merge the labelled and unlabelled libraries (just make sure there are no duplicate precursors).

Can also use library-free (with --var-mod), but that would typically result in noticeably worse performance.

Hope this helps! Please don't hesitate to contact me if there are any questions.

Best wishes,

Vadim

JWLackmann commented 4 years ago

Dear Vadim,

I am currently working on the library modification and have a question regarding the mass additions. Do I have to factor in the FragmentCharge into my added masses, or is DIA-NN doing that internally and uses monocharged masses in its library? e.g., do I just add 8.xxx or 8.xxx/z? Also, when you caution against duplicate precursors, I thought about adding the label mass to all precursors as well (again, do I have to factor in the PrecursorCharge here?) to prevent doubling (and check for unique values afterwards), or do you refer to another column?

Best, Jan

vdemichev commented 4 years ago

Hi Jan,

Yes, you do need to take into account the FragmentCharge, as well as add label mass to the precursors m/z (dividing it by the PrecursorCharge). Sorry, forgot to mention it previously. DIA-NN does not try to figure out the masses based on sequence/fragment annotation, so correct masses need to be specified.

Best wishes,

Vadim

vdemichev commented 4 years ago

By unique I meant there should not be two identical precursors in the library. Meaning two precursors with the same modified sequence and charge. So modified sequence should be changed by adding (UniMod:259) or (UniMod:267) (SILAC labels) everywhere where necessary.

vdemichev commented 4 years ago

If you wish, please contact me via email and I will send you a sample library with SILAC labels.

JWLackmann commented 4 years ago

Dear Vadim,

I created a SILAC library and DIA-NN processes everything. However, when I look into the output.tsv, I see two issues: The label ratio is still 0 in all cases and DIA-NN gives the exact same PG and Genes quantities for both peptides with and without the labels (which are listed for some peptides in Modified.Sequence and Precursor.id). This happens both when using no additional commands or the --mod command with --mod UniMod:188,6.020129,label (I have a data set with only the medium lysine label I use for testing to keep it simple).

I created the library by introducing the (UniMod:188) label to all K in the columns ModifiedPeptide, transition_name, transition_group_id, FullUniModPeptideName, ModifiedPeptide and PeptideGroupLabel, adjusting the masses of both precursor and fragments while keeping in mind corresponding precursor and fragment charges. Afterwards, I kicked out all modified b ions and appended everything to the original unlabeled library. However, after checking with your example DB again, I kept the same UniprotID for both labeled and non-labeled peptides, is this the issue? I can also provide you with a snippet by mail, maybe its another issue I am not seeing...?

All the best and thank you so much, Jan

vdemichev commented 4 years ago

Hi Jan,

Did you use "--peak-translation"? It's required for label ratio calculation. --mod UniMod:188,6.020129,label is indeed necessary for this: DIA-NN needs to be told that the modification is an isotopic label.

Protein quantities will depend on how proteins have been annotated in the spectral library. If UniprotID is the same, then quantities will also be the same - that's probably the reason. Can add some suffix/prefix to labelled UniprotIDs. However I would suggest just requantifying labelled/unlabelled protein quantities directly from either precursor quantities or from Label.Ratio using R.

On b-ions: if you remove these for labelled peptides, they should also be removed for unlabelled peptides. It's essential that library fragmentation pattern is exactly the same for labelled and unlabelled peptides.

Hope this helps!

Best wishes,

Vadim

JWLackmann commented 4 years ago

Hi Vadim,

I modified the library as you suggested and get results. However, when I try using the --peak-translation command DIA-NN calculates & gives expected results in the log, but no files are written afterwards. This happens both regardless of using b-ions or not in the library. Any idea, what happens and how to handle this? Maybe its due to how I parse the additional commands? I just put them together into one line with a space in-between ("--mod UniMod:188,6.020129,label --peak-translation") or in two lines but maybe this is wrong? Also wondering if you can recommend how to do quantification in R afterwards. I normally use mapDIA or MSstats, but maybe there is another tool requiring less data manipulation to test it with DIA-NN output'?

Best, Jan

vdemichev commented 4 years ago

Hi Jan,

Could you please share the log? What files do you expect to be written but which are not? I suggest using the DIA-NN R package for handling the output https://github.com/vdemichev/diann-rpackage. For protein quantification, we've tried MaxLFQ and it works very well (used by DIA-NN for gene quantification by default and also implemented in the DIA-NN R package). Before we started using MaxLFQ (only recently), we relied on the PECA package for differential expression analysis (it takes into account all the precursors identified for each protein).

Best wishes,

Vadim

JWLackmann commented 4 years ago

Hi Vadim,

thanks for the quick reply. I will look into your R package. I attached the log file from a recent search. When I use only a few raw. or .dia files, I get results, but when I try using more, I run into the issue that no files at all are created. The attached log stems from such a search (level 3, if you require another level, I can easily rerun). The exact same data set returns results when I do not use the --peak-translation command. I hoped to find something in the logs that one of the files creates an issue, but did not find anything.

Best, Jan

SILAC-Sebai_log_3.txt

vdemichev commented 4 years ago

Hi Jan,

This means DIA-NN crashed when doing --peak-translation. That function is still 'experimental', so things like that can happen. Thank you very much for making me aware of this, and sorry that it did not work... I will try to reproduce the error and fix it.

Best wishes.

Vadim

vdemichev commented 4 years ago

Preliminary, cannot reproduce the error with the PQ500 library. Maybe it's the library, maybe something else. In theory, I would recommend against using deep learning spectra prediction in conjunction with SILAC. The reason is, labelled and unlabelled precursors should have exactly the same fragmentation pattern in the library, which might not be the case for some precursors with deep learning. So if you'd like to use DIA-NN's predicted spectra, the way to do that would be to generate an in silico library first and then add SILAC peptdies, and not the other way around.

I will do some other experiments, see if I can make it crash.

Best wishes,

Vadim

JWLackmann commented 4 years ago

Dear Vadim,

I redid the analysis without deep learning prediction and everything went smoothly, I think that might have been the issue. Will look into the R package next.

Thank you for your great help and have a nice weekend, Jan

vdemichev commented 4 years ago

Great! I guess I should make DIA-NN print a warning when it detects such a situation...

JWLackmann commented 4 years ago

I think a proper "DIA-NN crashed" info instead of the normal exit message would be a great way to indicate something went wrong, ideally with a crash point indication, which might make detecting the issue easier. Yes, more warnings would be great. Any maybe a comprehensive list of additional commands, which are available as always looking everything up in the different text parts is a hassle (just in case I might slip in some user feedback ;-) )

animesh commented 3 years ago

wondering if there is any update on this? i have a silac file which i would like to analyze using dia-nn :)

vdemichev commented 3 years ago

Please contact me by email on details of how to use DIA_NN for SILAC

Best, Vadim

anfoss commented 2 years ago

Following up on this. I have 60 pulsed silac dia pasef files and would like to use only y ions for quant but use both b/y for identification. Would it be possible in dia-nn?

vdemichev commented 2 years ago

Yes, it's automatic actually, but you can also use --restrict-fr and specify which ions to exclude from quant using the ExcludeFromAssay column in the library

anfoss commented 2 years ago

So for MS2 level quant of pulsed silac data should I take the values from Fragment.Quant.Raw? Is it automatically selecting the same fragments across the entire datasets? There is no info about which ion those intensities are referring to (i.e y6, y7 and so on).

vdemichev commented 2 years ago

No, just use Precursor.Quantity or Precursor.Normalised. Or better please contact me by email I will send you info on how to use DIA-NN on SILAC data.

weiclav commented 2 years ago

Hi there,

I just want to make sure I understand the current situation in 1.8.1 version as per the published documents to the plexDIA manuscript right.

Library preparation step:

While searching the data, the same as above, plus one needs to specify the channels and to set the peak translation flag, e.g. to use extra settings like this: --fixed-mod SILAC,0.0,KR,label --channels SILAC,L,KR,0:0; SILAC,H,KR,8.014199:10.008269 --original-mods --strip-unknown-mods --peak-translation

No further tweaking of the quantification related settings is needed, DIANN will handle the correct fragments for the quantification on its own during the translation step (unless you want to quantify on the MS1 level).

Am I missing something critical or am I good to go with these extra settings for SILAC data?

Thanks a lot for the response!

Best, David