Issues when implementing `JoinAssays`

Pedroaragon9 commented 2 months ago

Hello,

I have an issue when utilizing the JoinAassays function on my qf object. I can succesfully generate the object.

names(sampleData)[1] <- "runCol"
> qf <-readQFeatures(assayData = inputData,
+              colData = sampleData,
+              runCol = "R.FileName",
+              quantCol = "FG.MS1Quantity",
+              removeEmptyCols = TRUE)
Checking arguments.
Loading data as a 'SummarizedExperiment' object.
Splitting data in runs.
Formatting sample annotations (colData).
Formatting data as a 'QFeatures' object.

and after inspecting it, everthing looks correct:

> qf
An instance of class QFeatures containing 49 assays:
 [1] 20240705_HB_PAF_Evo_FAIMS_1CV_20SPD_wDIA_120k_120k_20ng_1864_1: SummarizedExperiment with 56656 rows and 1 columns 
 [2] 20240705_HB_PAF_Evo_FAIMS_1CV_20SPD_wDIA_120k_120k_20ng_1864_2: SummarizedExperiment with 55382 rows and 1 columns 
 [3] 20240705_HB_PAF_Evo_FAIMS_1CV_20SPD_wDIA_120k_120k_20ng_1864_3: SummarizedExperiment with 55854 rows and 1 columns 
 ...
 [47] 20240705_HB_PAF_Evo_FAIMS_1CV_20SPD_wDIA_120k_120k_20ng_1992_3: SummarizedExperiment with 56610 rows and 1 columns 
 [48] 20240705_HB_PAF_Evo_FAIMS_1CV_20SPD_wDIA_120k_120k_20ng_1992_4: SummarizedExperiment with 57128 rows and 1 columns

It is not only until I run the JoinAssaysfunction as following that I encounter an issue:

qf <- joinAssays(qf,
                names(qf),
                name = "Precursors")

A new assay is successfully generated as expected

> qf
An instance of class QFeatures containing 49 assays:
 [1] 20240705_HB_PAF_Evo_FAIMS_1CV_20SPD_wDIA_120k_120k_20ng_1864_1: SummarizedExperiment with 56656 rows and 1 columns 
 [2] 20240705_HB_PAF_Evo_FAIMS_1CV_20SPD_wDIA_120k_120k_20ng_1864_2: SummarizedExperiment with 55382 rows and 1 columns 
 [3] 20240705_HB_PAF_Evo_FAIMS_1CV_20SPD_wDIA_120k_120k_20ng_1864_3: SummarizedExperiment with 55854 rows and 1 columns 
 ...
 [47] 20240705_HB_PAF_Evo_FAIMS_1CV_20SPD_wDIA_120k_120k_20ng_1992_3: SummarizedExperiment with 56610 rows and 1 columns 
 [48] 20240705_HB_PAF_Evo_FAIMS_1CV_20SPD_wDIA_120k_120k_20ng_1992_4: SummarizedExperiment with 57128 rows and 1 columns 
 [49] Precursors: SummarizedExperiment with 2677241 rows and 48 columns

However upon closer inspection it seems that all quantitative information on the joined assays, except for the first, is lost

>head(assay(qf[["Precursors"]]))
 20240705_HB_PAF_Evo_FAIMS_1CV_20SPD_wDIA_120k_120k_20ng_1864_1 20240705_HB_PAF_Evo_FAIMS_1CV_20SPD_wDIA_120k_120k_20ng_1864_2
1                                                       63943.00                                                             NA
2                                                       24949.91                                                             NA
3                                                       20851.92                                                             NA
4                                                       10158.90                                                             NA
5                                                       63598.29                                                             NA
6                                                       63280.89                                                             NA
  20240705_HB_PAF_Evo_FAIMS_1CV_20SPD_wDIA_120k_120k_20ng_1864_3 20240705_HB_PAF_Evo_FAIMS_1CV_20SPD_wDIA_120k_120k_20ng_1864_4
1                                                             NA                                                             NA
2                                                             NA                                                             NA
3                                                             NA                                                             NA
4                                                             NA                                                             NA
5                                                             NA                                                             NA
6                                                             NA                                                             NA
  20240705_HB_PAF_Evo_FAIMS_1CV_20SPD_wDIA_120k_120k_20ng_1866_1 20240705_HB_PAF_Evo_FAIMS_1CV_20SPD_wDIA_120k_120k_20ng_1866_2
1                                                             NA                                                             NA
2                                                             NA                                                             NA
3                                                             NA                                                             NA
4                                                             NA                                                             NA
5                                                             NA                                                             NA
6                                                             NA                                                             NA
  20240705_HB_PAF_Evo_FAIMS_1CV_20SPD_wDIA_120k_120k_20ng_1866_3 20240705_HB_PAF_Evo_FAIMS_1CV_20SPD_wDIA_120k_120k_20ng_1866_4
1                                                             NA                                                             NA
2                                                             NA                                                             NA
3                                                             NA                                                             NA
4                                                             NA                                                             NA
5                                                             NA                                                             NA
6                                                             NA                                                             NA
...
...

Interestingly, I had managed to join the assays before and even perform downstream DE analysis on the dataset using the exact workflow.

Other colleagues have been dealing with the same issue and we are quite puzzled as of why this is happening. I checked the values before the join and they seem fine. Interestingly I can perform other functions first such as aggregateFeaturesOverAssays() and then join the assays which will work. However, I noticed that information is also getting lost. This time on several columns of rowData(), and also results in failure during protein aggregation e.g. while using MsCoreUtils::robustSummary Error in .lm.fit(X, expression) : NA/NaN/Inf in 'y'

I have tried this workflow on several datasets and all encounter the same issue. I am not sure why. Looking forward to hearing your thoughts.

Thanks in advance

P.S. Ill be happy to share some datasets in order to try to reproduce the issue.

> sessionInfo()
R version 4.4.1 (2024-06-14)
Platform: aarch64-apple-darwin20
Running under: macOS Sonoma 14.1.1

Matrix products: default
BLAS:   /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib 
LAPACK: /Library/Frameworks/R.framework/Versions/4.4-arm64/Resources/lib/libRlapack.dylib;  LAPACK version 3.12.0

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats4    stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] scp_1.15.2                  lubridate_1.9.3             forcats_1.0.0               stringr_1.5.1               dplyr_1.1.4                
 [6] purrr_1.0.2                 readr_2.1.5                 tidyr_1.3.1                 tibble_3.2.1                ggplot2_3.5.1              
[11] tidyverse_2.0.0             diann_1.0.1                 QFeatures_1.15.3            MultiAssayExperiment_1.31.5 SummarizedExperiment_1.35.3
[16] Biobase_2.65.1              GenomicRanges_1.57.1        GenomeInfoDb_1.41.2         IRanges_2.39.2              S4Vectors_0.43.2           
[21] BiocGenerics_0.51.3         MatrixGenerics_1.17.0       matrixStats_1.4.1           limma_3.61.12              

loaded via a namespace (and not attached):
 [1] tidyselect_1.2.1            fastmap_1.2.0               SingleCellExperiment_1.27.2 lazyeval_0.2.2              nipals_0.8                 
 [6] digest_0.6.37               timechange_0.3.0            lifecycle_1.0.4             cluster_2.1.6               ProtGenerics_1.37.1        
[11] statmod_1.5.0               magrittr_2.0.3              compiler_4.4.1              rlang_1.1.4                 tools_4.4.1                
[16] igraph_2.0.3                utf8_1.2.4                  data.table_1.16.0           knitr_1.48                  S4Arrays_1.5.10            
[21] htmlwidgets_1.6.4           DelayedArray_0.31.14        plyr_1.8.9                  RColorBrewer_1.1-3          abind_1.4-8                
[26] withr_3.0.1                 grid_4.4.1                  fansi_1.0.6                 colorspace_2.1-1            scales_1.3.0               
[31] MASS_7.3-61                 cli_3.6.3                   crayon_1.5.3                generics_0.1.3              metapod_1.13.0             
[36] rstudioapi_0.16.0           httr_1.4.7                  reshape2_1.4.4              tzdb_0.4.0                  zlibbioc_1.51.1            
[41] AnnotationFilter_1.29.0     BiocManager_1.30.25         XVector_0.45.0              vctrs_0.6.5                 Matrix_1.7-0               
[46] slam_0.1-53                 jsonlite_1.8.9              IHW_1.33.0                  hms_1.1.3                   ggrepel_0.9.6              
[51] clue_0.3-65                 glue_1.8.0                  stringi_1.8.4               gtable_0.3.5                UCSC.utils_1.1.0           
[56] munsell_0.5.1               lpsymphony_1.33.1           pillar_1.9.0                htmltools_0.5.8.1           GenomeInfoDbData_1.2.13    
[61] R6_2.5.1                    RcppEigen_0.3.4.0.2         lattice_0.22-6              fdrtool_1.2.18              Rcpp_1.0.13                
[66] SparseArray_1.5.44          xfun_0.48                   MsCoreUtils_1.17.2          pkgconfig_2.0.3

lgatto commented 2 months ago

Thank you for using scp and QFeatures.

To be able to join assays, the function needs to be able to match rows from different sets/assays. But this can't be done at the precursor level - how do you know what precursor in run 1 to match with in run 2. After reading the data in, we first aggregate precursors into peptides using aggregateFeaturesOverAssays() so that each precursor-level assay is aggregated into a peptide-level equivalent. The peptide-level assays can then be joined by matching peptide sequences that have been found across multiple runs.

Some of the SCP.replication vignettes might also help.

By the way, assuming you are analysing SCP data, the readSCP() and readSCPfromDIANN() function could be useful.

Hope this helps, and don't hesitate to ask questions and/or request adjustments.

Pedroaragon9 commented 1 month ago

Hi Laurent,

Looking a bit more into what the function does it makes a lot sense now. Apologies for the confusion, in fact its running correctly now.

The readSCPfromDIANN() was also very helpful. Thanks for the help.

Cheers!

rformassspectrometry / QFeatures

Issues when implementing `JoinAssays` #218