wangshisheng / NAguideR

NAguideR: performing and prioritizing missing value imputations for consistent bottom-up proteomic analyses
MIT License
42 stars 3 forks source link

imputing across conditions vs within conditions #8

Open aleighbrown opened 4 months ago

aleighbrown commented 4 months ago

Thank you so much for your creation of this wonderful tool. I am brand new to analyzing mass-spec data, and have trying to get my head around how to best analyze my data.

I’m performing analyses on phosphorylation data obtained from TimsTOF mass-spec. I have n = 5 for each condition.

I have 3 conditions that I expect to vary quite widely in their phosphorylation status between the 3 conditions.

When I do the imputation, I’m mostly curious about the calculation of the coefficient of variation and how the imputation is done on a condition wide basis.

If I upload my samples with all 3 conditions in one sheet and one sample, many more phospho-peptides are filtered out compared to if I input the samples as imputing on samples just from condition A, and then impute just from condition B, and then just C alone.

If I biologically expect that the peptides should vary quite a bit between conditions, would it make sense to impute each condition alone? Or does the tool work better if I impute with all samples.

I also sent an email asking the same question

wangshisheng commented 4 months ago

Hi aleighbrown,

I think the situation you describe is normal. The reason that many more phospho-peptides are filtered out when you impute across 3 conditions may be caused by the parameters (i.e. NA ratio, CV threshold (raw scale)). As described in this tool, any protein/peptide with (NA ratio or CV) above the threshold will be removed. Therefore, for example, the CV of one protein/peptide from condition A or B is below the threshold, but above the threshold from condition C, this protein/peptide will be remained in the condition A or B, while it will be removed across 3 conditions.

So please check you data and adjust the parameters (Perhaps you could set NA ratio or CV a little larger for your data when you impute across 3 conditions). BTW, in my opinion, you could impute each condition alone or together. Both are reasonable.

Bests, Shisheng West China Hospital