vdemichev / diann-rpackage

Report processing and protein quantification for MS-based proteomics
Other
35 stars 6 forks source link

Very different results compared to diann::diann_maxlfq results #6

Open Mengbo-Li opened 2 years ago

Mengbo-Li commented 2 years ago

Dear Vadim,

As the manual to DIA-NN suggests, we are recommended to use iq to obtain MaxLFQ intensities as an alternative to diann-r. However, if I run the same example as given in diann_maxlfq(), it gives very different results from iq::maxlfq(): df <- data.frame(File.Name = c("A","A","A","B","B","B"), Protein.Names=rep("ALB",6), Precursor.Id=rep(c("PEPTIDE","EPTIDEP","PTIDEPE"),2), Precursor.Normalised=c(20,10,5,25,12,NA)) |> filter(!is.na(Precursor.Normalised))`

diann::diann_maxlfq(df) |> log2()

X <- matrix(c(df$Precursor.Normalised, NA), nrow = 3) |> log2() colnames(X) <- LETTERS[1:2] rownames(X) <- df$Precursor.Id[1:3]

iq::maxLFQ(X)$estimate

with outputs

> iq::maxLFQ(X)$estimate 3.492680 3.785161

> diann::diann_maxlfq(df) |> log2() 4.336651 4.629134

I understand that the two implementations are very different, but I wonder which implementation I should use as the "truth" or the benchmark in this case.

Many thanks, Mengbo

vdemichev commented 2 years ago

With 'iq', please use the fast_maxlfq function, see the syntax & data preparation requirements in the respective manual. The results are not expected to be identical.

Mengbo-Li commented 2 years ago

Yes I do not expect identical results, but as you can see here, the average log2-intensity for this example protein is very different between the two methods. In fact I have tried on a much larger dataset, and the discrepancy in average log2-intensities between the two methods is quite big. We observed a more compressed range of average values by iq but a wider range of average intensities by diann-r.

Mengbo-Li commented 2 years ago

And with the larger dataset, I used iq::process_long_format() so it was fast_maxlfq that is called.