vdemichev / diann-rpackage

Report processing and protein quantification for MS-based proteomics
Other
32 stars 6 forks source link

maxLFQ: diann-r versus diann-cpp #5

Open bhagwataditya opened 2 years ago

bhagwataditya commented 2 years ago

Dear Vadim,

thank you for developing diann. We think it is awesome and use it a lot :) Recently, we started having a look also at the diann-r package. We wanted to try to compute the MaxLFQ values using the diann-r package (1.0.1). We noticed that the values computed in that way differ from the values returned by diann-cpp (1.8). Is it something we are overlooking? Could you help us?

Here is a reproducible example

# Read
    require(magrittr)
    require(data.table)
    url <- 'https://bitbucket.org/graumannlabtools/autonomics/downloads/szymanski22.report.tsv'
    file <- file.path(tempdir(), basename(url))
    download.file(url, destfile = file, mode = 'wb')
    dt <- fread(file)
    dt %<>% extract(unique(Protein.Names)[1:2], on = 'Protein.Names')
    dt$File.Name %<>% factor()
    levels(dt$File.Name) %<>% substr(nchar(.)-2, nchar(.)-2)
    levels(dt$File.Name) %<>% paste0('_', .)
    dt$File.Name %<>% as.character()

# cpp MaxLFQ
    cmat <- unique(dt[, .(Protein.Names, File.Name, PG.MaxLFQ)])
    cmat %<>% dcast.data.table(Protein.Names ~ File.Name, value.var = 'PG.MaxLFQ')

# r MaxLFQ
    x <- diann::diann_maxlfq(dt)
    x %<>% extract(, names(cmat)[-1])
    x %<>% extract(cmat$Protein.Names, )

# Seem to not match
    cmat
    x

image

vdemichev commented 3 months ago

Sorry for the super late reply. Yes, the reason they are different is because DIA-NN itself uses a slightly more sophisticated algorithm, i.e. this is by design.

Best, Vadim