tvpham / iq

An R package to estimate relative protein abundances from ion quantification in DIA-MS-based proteomics
BSD 3-Clause "New" or "Revised" License
19 stars 9 forks source link

A comparison between iq `fast_maxlfq` and diann `diann_maxlfq` #18

Open yukun01 opened 1 month ago

yukun01 commented 1 month ago

Hi Thang, I ran iq package and diann package to deal with a DIANN result (report.tsv).

iq_dat <- iq::fast_read("./data/new/dia/diann/report.tsv",
                        sample_id  = "File.Name",
                        primary_id = "Protein.Group", 
                        secondary_id = "Precursor.Id",
                        intensity_col = "Precursor.Normalised",   # default in diann_maxlfq
                        annotation_col = c("Protein.Names", "Genes"), 
                        filter_string_equal = NULL, 
                        filter_double_less = c(Q.Value = "0.01", PG.Q.Value = "0.01"))
iq_norm_data <- iq::fast_preprocess(iq_dat$quant_table)
result_fastest <- iq::fast_MaxLFQ(iq_norm_data, 
                                  row_names = iq_dat$protein[, 1], 
                                  col_names = iq_dat$sample)
iq.maxlfq <- 2^as.matrix(result_fastest$estimate)
> head(iq.maxlfq)
       F:\\mxb\\hela\\hela-20240513\\MS20240511-HELA-DIA-160ng.raw F:\\mxb\\hela\\hela-20240513\\MS20240511-HELA-DIA-180ng.raw
P55011                                                    173801.5                                                    174187.8
Q96JP5                                                    113199.8                                                    108254.0
Q9Y4H2                                                    116759.9                                                    118446.3
P36578                                                    949029.0                                                    960495.3
Q6SPF0                                                    200438.9                                                    224455.9
O76031                                                    268230.5                                                    265618.5
       F:\\mxb\\hela\\hela-20240513\\MS20240511-HELA-DIA-200ng.raw
P55011                                                    165438.1
Q96JP5                                                    117464.3
Q9Y4H2                                                    116507.7
P36578                                                    953222.8
Q6SPF0                                                    228031.7
O76031                                                    257683.7
> head(protein.groups)
       F:\\mxb\\hela\\hela-20240513\\MS20240511-HELA-DIA-160ng.raw F:\\mxb\\hela\\hela-20240513\\MS20240511-HELA-DIA-180ng.raw
P55011                                                    815263.4                                                    819384.6
Q96JP5                                                    370781.9                                                    355589.0
Q9Y4H2                                                    609672.7                                                    620270.6
P36578                                                  24022242.5                                                  24381824.3
Q6SPF0                                                    593207.6                                                    666172.4
O76031                                                   1836270.1                                                   1823555.8
       F:\\mxb\\hela\\hela-20240513\\MS20240511-HELA-DIA-200ng.raw
P55011                                                    779136.8
Q96JP5                                                    386285.8
Q9Y4H2                                                    610823.7
P36578                                                  24225236.0
Q6SPF0                                                    677569.9
O76031                                                   1771114.3

Furthermore, the correlation between these two results was ~ 0.5

> cor(na.omit(iq.maxlfq), na.omit(protein.groups))
                                                            F:\\mxb\\hela\\hela-20240513\\MS20240511-HELA-DIA-160ng.raw
F:\\mxb\\hela\\hela-20240513\\MS20240511-HELA-DIA-160ng.raw                                                   0.4958414
F:\\mxb\\hela\\hela-20240513\\MS20240511-HELA-DIA-180ng.raw                                                   0.5162234
F:\\mxb\\hela\\hela-20240513\\MS20240511-HELA-DIA-200ng.raw                                                   0.5028992
                                                            F:\\mxb\\hela\\hela-20240513\\MS20240511-HELA-DIA-180ng.raw
F:\\mxb\\hela\\hela-20240513\\MS20240511-HELA-DIA-160ng.raw                                                   0.4893200
F:\\mxb\\hela\\hela-20240513\\MS20240511-HELA-DIA-180ng.raw                                                   0.5137953
F:\\mxb\\hela\\hela-20240513\\MS20240511-HELA-DIA-200ng.raw                                                   0.4974237
                                                            F:\\mxb\\hela\\hela-20240513\\MS20240511-HELA-DIA-200ng.raw
F:\\mxb\\hela\\hela-20240513\\MS20240511-HELA-DIA-160ng.raw                                                   0.4919188
F:\\mxb\\hela\\hela-20240513\\MS20240511-HELA-DIA-180ng.raw                                                   0.5133074
F:\\mxb\\hela\\hela-20240513\\MS20240511-HELA-DIA-200ng.raw                                                   0.5016175

Interestingly, the Coefficient of Variation (CV) value of two results was almost same.

1716436725310

1716436773503

So, I'm super interested in the difference of iq fast_maxlfq and diann diann_maxlfq.

Many thanks, Yuxiang Tang