Open Mengbo-Li opened 2 years ago
With 'iq', please use the fast_maxlfq function, see the syntax & data preparation requirements in the respective manual. The results are not expected to be identical.
Yes I do not expect identical results, but as you can see here, the average log2-intensity for this example protein is very different between the two methods. In fact I have tried on a much larger dataset, and the discrepancy in average log2-intensities between the two methods is quite big. We observed a more compressed range of average values by iq but a wider range of average intensities by diann-r.
And with the larger dataset, I used iq::process_long_format() so it was fast_maxlfq that is called.
Dear Vadim,
As the manual to DIA-NN suggests, we are recommended to use
iq
to obtain MaxLFQ intensities as an alternative to diann-r. However, if I run the same example as given indiann_maxlfq()
, it gives very different results fromiq::maxlfq():
df <- data.frame(File.Name = c("A","A","A","B","B","B"), Protein.Names=rep("ALB",6), Precursor.Id=rep(c("PEPTIDE","EPTIDEP","PTIDEPE"),2), Precursor.Normalised=c(20,10,5,25,12,NA)) |> filter(!is.na(Precursor.Normalised))`diann::diann_maxlfq(df) |> log2()
X <- matrix(c(df$Precursor.Normalised, NA), nrow = 3) |> log2()
colnames(X) <- LETTERS[1:2]
rownames(X) <- df$Precursor.Id[1:3]
iq::maxLFQ(X)$estimate
with outputs
> iq::maxLFQ(X)$estimate
3.492680 3.785161
> diann::diann_maxlfq(df) |> log2()
4.336651 4.629134
I understand that the two implementations are very different, but I wonder which implementation I should use as the "truth" or the benchmark in this case.
Many thanks, Mengbo