LFQ workflow compatibility?

eriktjansson commented 3 years ago

Hello,

I am working with label-free quantitative (LFQ) proteomics and am interested in adapting your R package workflow for use with results obtained from ISOQuant (an opensource package for Top3 analysis of Waters PLGS search results). A major difference compared to TMT is that I will end up with one single excelfile containing each temperature and treatment (certainly at the cost of way more MS-runs since these are individually analyzed).

I think your program has a lot of effort put into it and I am very impressed with its functionality! I have gone through the code and at first I believed I should easily be able to adapt the config file accordingly to parse my results. However, I do see a few concern regarding this and have a few questions as to what variable names are hardcoded into your program compared to what is optional.

I assume these are provided as output from your inhouse pipeline (isobarQuant) - but are all these necessary to fit the Hill equation to the data? Some of these I have equivalents for, such as counts per protein for unique peptides, razor peptides (is this what you call qusm and qupm?) and intensity, but I am less certain what means2i and ms1seqs represent. I am attaching an example report sheet from the output obtained from the workflow. QC HeLa 4 month_user designed 20210115-113837_quantification_report.xlsx Hence, I am not sure how easily label-free results from Waters could be relabeled to be compatible with RTSA or if it would be easier to make a forked version where some of the import functions to are rewritten to make the analysis steps downstream compatible with ISOQuant output.

Keep the good work up! Erik

PS. I completely understand if a subfork is beyond your own interests but it would be interesting to hear a comment from you which option you believe is the most feasible for TPP-LFQ.

mathiaskalxdorf commented 3 years ago

Hi Erik,

the code is designed for our in-house pipeline and hence there are several columns which are currently expected but not all are actually essential. Examples are the ms1seqs or meanS2I which are just some qualty values (e.g. mean signal to interference for estimation of co-fragmentation for the MS2-based reporterion quantification). Nevertheless, I expect it would be easier to adapt the code (especially the readData.R but also following code) than trying to get your data into the same format. The relevant information is the sumionarea per protein and temperature and condition (in your case MS1 intensity per single MS-run for the respective control or treatment and respective temperature) plus some qualty values like numbers of unique matched spectra/peptides per protein and condition (here qusm/qupm) for filtering. Based on the intensities, the relative fold changes per temperature are calculated (relative to the 37 °C reference control sample) and from here the code should work without any other required information. Other columns currently expected by the code (e.g. column ms1intensity) are in this case TMT-specific and are only used for some QC plots at the end (here you could maybe simply use the mean observed ms1intensity across all conditions per protein).

The only concern which I have for using the code for label-free data is: the code is designed for TMT-data so that there are no/almost no missing values across all temperatures per condition. With label-free data one can expect several missing values across the temperature gradient per protein. I´m not sure if the code in its current version can deal with this in a correct way. So I would suggest (at least for the beginning) to filter the data for complete quantification across all temperatures. As this might be difficult especially because of the lack of identification in the high-temperature samples, some sort of missing-value imputation might be beneficial. In this case, maybe developing a new imputation approach which takes the sigmoidal curve shape of protein denaturation into consideration, could be beneficial as well.

I hope this is helping at least a bit.

Best regards,

Mathias

eriktjansson commented 3 years ago

Thanks for the valuable comments! I will have a look into piping our data into the latter parts of the program. We are using a DIA approach so while there may be missing values it is at least a bit better off than what label-free DDA would offer. I'll have to see if imputations becomes necessary.

If I get it to work I think it could also increase the general usefulness of your program so I would like to fork a LFQ-version if you agree. Please let me know if your license is MIT, GPL or something else?

Best regards, Erik

mathiaskalxdorf / RTSA

LFQ workflow compatibility? #2