Error in dplyr::mutate() during "Combine all the information in prioritization tables"

nvribeiro commented 8 months ago

Hi multinichenet team!

I am facing the following error when trying to run multi_nichenet_analysis() on my dataset.

The output I get: [1] "Calculate differential expression for all cell types" [1] "DE analysis is done:" [1] "included cell types are:" [1] "IEL" "Early.enterocytes" "Intermediate.enterocytes"
[4] "TA.cells" "Enteroendocrine.cells" "BEST4.cells"
[7] "Mature.enterocytes" "Tuft.cells" "Goblet.NEUROG3..progenitor" [10] "Stem.cells" "Paneth.cells"
[1] "Make diagnostic abundance plots + Calculate expression information" [1] "Calculate NicheNet ligand activities and ligand-target links" [1] "Combine all the information in prioritization tables" Error in dplyr::mutate(): ℹ In argument: scaled_LR_prod = nichenetr::scaling_zscore(ligand_receptor_prod). Caused by error in if (sd(x, na.rm = TRUE) > 0) ...: ! missing value where TRUE/FALSE needed

`Backtrace: ▆

├─multinichenetr::multi_nichenet_analysis(...)
│ └─multinichenetr::multi_nichenet_analysis_combined(...)
│ ├─base::suppressMessages(...)
│ │ └─base::withCallingHandlers(...)
│ └─multinichenetr::generate_prioritization_tables(...)
│ └─... %>% dplyr::ungroup()
├─dplyr::ungroup(.)
├─dplyr::mutate(...)
├─dplyr:::mutate.data.frame(...)
1. │ └─dplyr:::mutate_cols(.data, dplyr_quosures(...), by)
2. │ ├─base::withCallingHandlers(...)
3. │ └─dplyr:::mutate_col(dots[[i]], data, mask, new_columns)
4. │ └─mask$eval_all_mutate(quo)
5. │ └─dplyr (local) eval()
6. └─nichenetr::scaling_zscore(ligand_receptor_prod)`

Any ideas on what could be the cause of this issue and how to solve it? Thank you!

browaeysrobin commented 8 months ago

Hi @nvribeiro

Can you check/give me the following information:

1) what input arguments did you use when running multi_nichenet_analysis (specifically focusing on non-default choices) ? 2) which version of multinichenetr are you using? 3) do you obtain the same error when performing the step-by-step analysis? (https://github.com/saeyslab/multinichenetr/blob/main/vignettes/basic_analysis_steps_MISC.md)

Moreover, can you show me the output of the following lines of code (you can always hide gene names or other types of sensitive information) 4) after running 3) and encountering an error there: abundance_expression_info$sender_receiver_info$avg_df %>% filter(is.nan(ligand_receptor_prod) | is.na(ligand_receptor_prod) | is.infinite(ligand_receptor_prod) ) (this should be empty, but possibly is not in your case) 5) after running 3): abundance_info$abund_plot_sample

nvribeiro commented 8 months ago

Hi @browaeysrobin

I now tried to run the analysis using the step-by-step approach instead of the wrapper function, Everything works fine, but I get the same error when running the generate_prioritization_tables() part. So, to answer your questions:

I am not sure if I have any non-default options that I pass to generate_prioritization_tables(). I'm using the abundance DE results from the previous steps, the prioritization weights suggested in the tutorial. Furthermore, I have the following contrast and parameters:

# Contrast (comparison) between groups: Comparing CeD vs Ctrl
contrasts_oi = c("'CeD-Ctrl','Ctrl-CeD'")

# Necessary for downstream visualisation
contrast_tbl = tibble(
  contrast = c("CeD-Ctrl", "Ctrl-CeD"), 
  group = c("CeD","Ctrl"))

# Parameters for NicheNet ligand analysis
logFC_threshold = 0.50 # minimum FC of DE genes
p_val_threshold = 0.05 # min pval
fraction_cutoff = 0.05 # min fraction of expression
p_val_adj = TRUE # use p-val adjusted as filter
empirical_pval = FALSE
verbose = TRUE
top_n_target = 250 # select top n target genes to consider in the ligand-target inference
cores_system = 6 # n of cores to use in the analysis
n.cores = min(cores_system, union(senders_oi, receivers_oi) %>% length())

multinichenetr_1.0.3
No, the tutorial works fine (using the tutorial dataset and following all steps).

I'm not sure if I should run this on my analysis or on the tutorial. I have an empty result in both cases.

> abundance_expression_info$sender_receiver_info$avg_df %>% filter(is.nan(ligand_receptor_prod)  | is.na(ligand_receptor_prod) | is.infinite(ligand_receptor_prod) )
# A tibble: 0 × 8
# ℹ 8 variables: sample <chr>, sender <fct>, receiver <fct>, ligand <chr>, receptor <chr>, avg_ligand <dbl>, avg_receptor <dbl>,
#   ligand_receptor_prod <dbl>

Also for both I get Error: object 'abundance_info' not found

A bit more of context that I hope it can help. My dataset has 5 controls vs 4 disease samples, divided in two experimental batches. I'm looking a subset of cells at the moment, and the abundances look like this (min_cells=5):

I hope this new info helps.

a-munoz-rojas commented 8 months ago

I am also seeing this error when running the wrapper function. I am specifically running it with sender_receiver_separate = TRUE, other than that the parameters are the same as default (and as the tutorials).

browaeysrobin commented 8 months ago

Hi @a-munoz-rojas

Can you provide the following information:

1) Does the wrapper function give the same error on the vignette data or only on your own data? 2) Can you share your sessionInfo()?

browaeysrobin commented 8 months ago

Hi @nvribeiro Thanks for the information. At first glance, everything looks fine so I have a hard time thinking what may have gone wrong for your data. I would need some more time to dig into this issue. Can you share your sessionInfo information in the meantime?

nvribeiro commented 8 months ago

Hi @browaeysrobin

Thank you for your help. Here is my sessionInfo:

R version 4.3.1 (2023-06-16)
Platform: aarch64-apple-darwin20 (64-bit)
Running under: macOS Sonoma 14.0

Matrix products: default
BLAS:   /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib 
LAPACK: /Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/lib/libRlapack.dylib;  LAPACK version 3.11.0

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

time zone: Europe/Amsterdam
tzcode source: internal

attached base packages:
[1] stats4    stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] SingleCellExperiment_1.22.0 SummarizedExperiment_1.30.2 Biobase_2.60.0              GenomicRanges_1.52.0        GenomeInfoDb_1.36.4        
 [6] IRanges_2.34.1              S4Vectors_0.38.2            BiocGenerics_0.46.0         MatrixGenerics_1.12.3       matrixStats_1.0.0          
[11] multinichenetr_1.0.3        lubridate_1.9.3             forcats_1.0.0               stringr_1.5.0               dplyr_1.1.3                
[16] purrr_1.0.2                 readr_2.1.4                 tidyr_1.3.0                 tibble_3.2.1                ggplot2_3.4.3              
[21] tidyverse_2.0.0             SeuratObject_4.1.3          Seurat_4.3.0               

loaded via a namespace (and not attached):
  [1] progress_1.2.2            nnet_7.3-19               locfdr_1.1-8              goftest_1.2-3             Biostrings_2.68.1        
  [6] vctrs_0.6.3               spatstat.random_3.1-6     digest_0.6.33             png_0.1-8                 shape_1.4.6              
 [11] proxy_0.4-27              ggrepel_0.9.3             deldir_1.0-9              parallelly_1.36.0         MASS_7.3-60              
 [16] reshape2_1.4.4            httpuv_1.6.11             foreach_1.5.2             withr_2.5.1               xfun_0.40                
 [21] ggpubr_0.6.0              ellipsis_0.3.2            survival_3.5-7            memoise_2.0.1             ggbeeswarm_0.7.2         
 [26] emmeans_1.8.9             zoo_1.8-12                GlobalOptions_0.1.2       gtools_3.9.4              pbapply_1.7-2            
 [31] Formula_1.2-5             prettyunits_1.2.0         KEGGREST_1.40.1           promises_1.2.1            httr_1.4.7               
 [36] rstatix_0.7.2             globals_0.16.2            fitdistrplus_1.1-11       rstudioapi_0.15.0         miniUI_0.1.1.1           
 [41] generics_0.1.3            base64enc_0.1-3           zlibbioc_1.46.0           ScaledMatrix_1.8.1        ggraph_2.1.0             
 [46] polyclip_1.10-6           randomForest_4.7-1.1      GenomeInfoDbData_1.2.10   xtable_1.8-4              doParallel_1.0.17        
 [51] evaluate_0.22             S4Arrays_1.0.6            hms_1.1.3                 irlba_2.3.5.1             colorspace_2.1-0         
 [56] visNetwork_2.1.2          ROCR_1.0-11               reticulate_1.32.0         spatstat.data_3.0-1       magrittr_2.0.3           
 [61] lmtest_0.9-40             later_1.3.1               viridis_0.6.4             lattice_0.21-9            genefilter_1.82.1        
 [66] spatstat.geom_3.2-5       future.apply_1.11.0       XML_3.99-0.14             scattermore_1.2           scuttle_1.10.3           
 [71] shadowtext_0.1.2          cowplot_1.1.1             RcppAnnoy_0.0.21          class_7.3-22              Hmisc_5.1-1              
 [76] pillar_1.9.0              nlme_3.1-163              iterators_1.0.14          caTools_1.18.2            compiler_4.3.1           
 [81] beachmat_2.16.0           stringi_1.7.12            gower_1.0.1               tensor_1.5                minqa_1.2.6              
 [86] plyr_1.8.9                crayon_1.5.2              abind_1.4-5               scater_1.28.0             blme_1.0-5               
 [91] locfit_1.5-9.8            sp_2.1-0                  bit_4.0.5                 graphlayouts_1.0.1        UpSetR_1.4.0             
 [96] codetools_0.2-19          recipes_1.0.8             BiocSingular_1.16.0       e1071_1.7-13              GetoptLong_1.0.5         
[101] plotly_4.10.2             remaCor_0.0.16            mime_0.12                 splines_4.3.1             circlize_0.4.15          
[106] Rcpp_1.0.11               sparseMatrixStats_1.12.2  blob_1.2.4                knitr_1.44                utf8_1.2.3               
[111] clue_0.3-65               lme4_1.1-34               listenv_0.9.0             checkmate_2.2.0           DelayedMatrixStats_1.22.6
[116] Rdpack_2.5                ggsignif_0.6.4            estimability_1.4.1        Matrix_1.6-1.1            statmod_1.5.0            
[121] tzdb_0.4.0                tweenr_2.0.2              pkgconfig_2.0.3           tools_4.3.1               cachem_1.0.8             
[126] RSQLite_2.3.1             RhpcBLASctl_0.23-42       rbibutils_2.2.15          DBI_1.1.3                 viridisLite_0.4.2        
[131] numDeriv_2016.8-1.1       fastmap_1.1.1             rmarkdown_2.25            scales_1.2.1              grid_4.3.1               
[136] ica_1.0-3                 nichenetr_2.0.4           broom_1.0.5               patchwork_1.1.3           carData_3.0-5            
[141] RANN_2.6.1                rpart_4.1.19              farver_2.1.1              aod_1.3.2                 tidygraph_1.2.3          
[146] mgcv_1.9-0                DiagrammeR_1.0.10         foreign_0.8-85            cli_3.6.1                 leiden_0.4.3             
[151] lifecycle_1.0.3           caret_6.0-94              uwot_0.1.16               glmmTMB_1.1.8             mvtnorm_1.2-3            
[156] bluster_1.10.0            lava_1.7.2.1              backports_1.4.1           annotate_1.78.0           BiocParallel_1.34.2      
[161] timechange_0.2.0          gtable_0.3.4              rjson_0.2.21              ggridges_0.5.4            progressr_0.14.0         
[166] parallel_4.3.1            pROC_1.18.4               limma_3.56.2              jsonlite_1.8.7            edgeR_3.42.4             
[171] bitops_1.0-7              bit64_4.0.5               Rtsne_0.16                spatstat.utils_3.0-3      BiocNeighbors_1.18.0     
[176] muscat_1.14.0             metapod_1.7.0             dqrng_0.3.1               pbkrtest_0.5.2            timeDate_4022.108        
[181] lazyeval_0.2.2            shiny_1.7.5               htmltools_0.5.6           sctransform_0.3.5         glue_1.6.2               
[186] factoextra_1.0.7          XVector_0.40.0            RCurl_1.98-1.12           scran_1.28.2              gridExtra_2.3            
[191] EnvStats_2.8.1            boot_1.3-28.1             igraph_1.5.1              variancePartition_1.30.2  TMB_1.9.6                
[196] R6_2.5.1                  sva_3.48.0                DESeq2_1.40.2             gplots_3.1.3              fdrtool_1.2.17           
[201] labeling_0.4.3            cluster_2.1.4             ipred_0.9-14              nloptr_2.0.3              DelayedArray_0.26.7      
[206] tidyselect_1.2.0          vipor_0.4.5               htmlTable_2.4.1           ggforce_0.4.1             car_3.1-2                
[211] AnnotationDbi_1.62.2      future_1.33.0             ModelMetrics_1.2.2.2      rsvd_1.0.5                munsell_0.5.0            
[216] KernSmooth_2.23-22        data.table_1.14.8         htmlwidgets_1.6.2         ComplexHeatmap_2.16.0     RColorBrewer_1.1-3       
[221] rlang_1.1.1               spatstat.sparse_3.0-2     spatstat.explore_3.2-3    lmerTest_3.1-3            ggnewscale_0.4.9         
[226] fansi_1.0.4               hardhat_1.3.0             beeswarm_0.4.0            prodlim_2023.08.28

nvribeiro commented 8 months ago

Hi @browaeysrobin

Update: when I removed the batch information from analysis, everything works fine, I managed to run the whole pipeline. My guess would be the issue is caused because I have a very unequal distribution of samples between the two batches and that it's causing some problems perhaps?

Let me know if you can think of a way to overcome this issue :)

FBaiyu commented 8 months ago

I met the same error with my batch information is NA

browaeysrobin commented 8 months ago

Hi @nvribeiro @FBaiyu @a-munoz-rojas

In the meantime, I encountered a very similar issue on one of the datasets I am working with.

The problems there: 1) there were cell types with only sufficient cells in only one sample in only one group + 2) there were ligand/receptor expression values returned for "celltype-celltype" interactions predicted in samples even though one of the pair members was not present in that sample.

I am implementing automatic solutions to these problems in the v2 release that is currently on the devel-branch of this package. Temporary solutions for you

Solutions: 1) remove these "absent" cell types prior to your analysis if you would have them 2) run the following lines of code (based on the step-by-step script, after running abundance_expression_info = get_abundance_expression_info(sce = sce, sample_id = sample_id, group_id = group_id, celltype_id = celltype_id, min_cells = min_cells, senders_oi = senders_oi, receivers_oi = receivers_oi, lr_network = lr_network, batches = batches)

abundance_expression_info$sender_receiver_info$pb_df = abundance_expression_info$sender_receiver_info$pb_df %>% ungroup() %>% inner_join(abundance_expression_info$sender_receiver_info$pb_df_group %>% ungroup() %>% distinct(ligand, receptor, sender, receiver))
abundance_expression_info$sender_receiver_info$avg_df = abundance_expression_info$sender_receiver_info$avg_df %>% ungroup() %>% inner_join(abundance_expression_info$sender_receiver_info$avg_df_group %>% ungroup() %>% distinct(ligand, receptor, sender, receiver))
abundance_expression_info$sender_receiver_info$frq_df = abundance_expression_info$sender_receiver_info$frq_df %>% ungroup() %>% inner_join(abundance_expression_info$sender_receiver_info$frq_df_group %>% ungroup() %>% distinct(ligand, receptor, sender, receiver))

Does this fix your issue?

nvribeiro commented 7 months ago

Hi @browaeysrobin

I tried running the analysis now only using cells that are present across all conditions and batches (see abundance plot below), and running the lines of code that you suggested. I still get the same error. The only way I can run the whole thing is if I ignore my batch information.

8c2f838c-e34d-4baa-8b8b-a1467051fa79

browaeysrobin commented 6 months ago

Hi @nvribeiro

Not sure this figure represents your last analysis, but in any case: all sample_ids should be unique. Here you seem to have a "Ctrl1" sample in two condition-batch combinations. You can make them easily unique by pasting the batch and/or condition to the current sample id.

nvribeiro commented 6 months ago

Hi @browaeysrobin

Thank you, that solved the problem indeed.

I have another problem now when adding a new numeric covariate, I am trying to add the cellular detection rate (CDR) used in DEG analysis with MAST (https://genomebiology.biomedcentral.com/articles/10.1186/s13059-015-0844-5) as a covariate because I used this in my original DEG analysis and I would like to have the same covariates in both analysis. However, when I add this variable, the get_DE_info stops working. I get the following error for all cell types:

perform_muscat_de_analysis errored for celltype: Stem.cells
Here's the original error message:
BiocParallel errors
  1 remote errors, element index: 1
  0 unevaluated and other errors
  first remote error:
Error in estimateDisp.default(y = y$counts, design = design, group = group, : object 'prior.n' not found
<bplist_error: BiocParallel errors
  1 remote errors, element index: 1
  0 unevaluated and other errors
  first remote error:
Error in estimateDisp.default(y = y$counts, design = design, group = group, : object 'prior.n' not found

I also have (scaled) age as a numeric covariate and for that it works perfectly, so I'm not sure whats the problem here. Any ideas? Let me know if I should open a new issue to discuss this, since it's not entirely related to this one.

Thank you again for you help =)

browaeysrobin commented 6 months ago

Hi @nvribeiro

I am not so familiar with the CDR, but it seems a cell-level covariate to me (correct me when I am wrong). With Pseudobulk-based DE analysis, and thus MultiNicheNet, you can only correct for sample-level covariates (that's probably why working with age did work).

saeyslab / multinichenetr

Error in dplyr::mutate() during "Combine all the information in prioritization tables" #39