immunomind / immunarch

🧬 Immunarch: an R Package for Fast and Painless Exploration of Single-cell and Bulk T-cell/Antibody Immune Repertoires
https://immunarch.com
Apache License 2.0
296 stars 65 forks source link

Loading 10x Genomics Data: Error in step_subset #214

Open mdozmorov opened 2 years ago

mdozmorov commented 2 years ago

Hello. I started with the Loading 10x Genomics Data tutorial, downloaded the CSV files from 10X website and ran immdata_10x <- repLoad(file_path). It results in error, reproducible with the data I actually want to analyze:

== Step 1/3: loading repertoire files... ==

Processing "/Users/mdozmorov/Documents/Data/VCU_work/Sawalha/2021-06.scRNA_scATAC/test_immunarch/data" ...
  -- [1/5] Parsing "/Users/mdozmorov/Documents/Data/VCU_work/Sawalha/2021-06.scRNA_scATAC/test_immunarch/data/vdj_v1_mm_c57bl6_pbmc_t_all_contig_annotations.csv" -- 10x (filt.contigs)
  [!] Removed 2917 clonotypes with no nucleotide and amino acid CDR3 sequence.                                                             
Error in step_subset(parent, vars = vars, groups = groups, arrange = arrange,  : 
  is.null(j) || is_expression(j) is not TRUE
In addition: Warning message:
The following named parsers don't match the column names: barcode,is_cell,contig_id,high_confidence,length,chain,v_gene,d_gene,j_gene,c_gene,full_length,productive,cdr3,cdr3_nt,reads,umis,raw_clonotype_id,raw_consensus_id 

The files I downloaded and put in a separate file_path folder are:

vdj_v1_mm_c57bl6_pbmc_t_all_contig_annotations.csv
vdj_v1_mm_c57bl6_pbmc_t_clonotypes.csv
vdj_v1_mm_c57bl6_pbmc_t_consensus_annotations.csv
vdj_v1_mm_c57bl6_pbmc_t_filtered_contig_annotations.csv
vdj_v1_mm_c57bl6_pbmc_t_metrics_summary.csv

I'm using Immunarch v.0.6.7 on a Mac. What may be wrong?

MVolobueva commented 2 years ago

Hi, @mdozmorov! My name is Maria Volobueva, I am a developer of the Immunarch package.

We have managed to reproduce your issue. Now we are working on fixing it.

I will get back to you with any updates.

Thank you so much for drawing our attention to this.

Good luck, Maria Volobueva

MVolobueva commented 2 years ago

Hello, @mdozmorov ​ I've figured out what the bug was. We have already fixed it in the dev-branch of Immunarch. ​ To install this branch you can utilize the following commands: ​ install.packages(c("devtools", "pkgload")) devtools::install_github("immunomind/immunarch", ref="dev") devtools::reload(pkgload::inst("immunarch")) ​ If you are working in Rstudio and the bug bothers you again, you need to go to Tools -> Project Options -> Restore .Rdata into workspace at startup -> No and then start your new project. ​ Do not hesitate to contact us with any questions further along. ​ Good luck, Maria Volobueva

mdozmorov commented 2 years ago

Thanks, Maria, I followed your instructions verbatim, but the problem still persists. I did reinstall immunarch from the dev branch, updated all packages, ensured global and local workspace restoring is disabled. I'm copy-pasting the code, the error is identical.

# 1.1) Load the package into R:
# devtools::install_github("immunomind/immunarch", ref="dev")
library(immunarch)

# 1.2) Replace with the path to your processed 10x data or to the clonotypes file
file_path = "/Users/mdozmorov/Documents/Data/VCU_work/test_immunarch/data"

# 1.3) Load 10x data with repLoad
immdata_10x <- repLoad(file_path)
Alexander230 commented 2 years ago

Hi, @mdozmorov!

My name is Aleksandr Popov, I am a developer of the Immunarch package.

When I tried to reproduce this bug, I noticed that it appears only when there are remains of old version of Immunarch, or there are function name conflicts in R environment. Please try to run R from terminal with R --vanilla command (to start it with empty environment) and run these commands:

install.packages(c("devtools", "pkgload"))
devtools::install_github("immunomind/immunarch", ref="dev")
devtools::reload(pkgload::inst("immunarch"))
file_path = "/Users/mdozmorov/Documents/Data/VCU_work/test_immunarch/data"
immdata_10x <- repLoad(file_path)

I hope this will help to load the data correctly.

Best regards, Aleksandr

mdozmorov commented 2 years ago

It didn't help. The R --vanilla session still senses the installation and

Skipping install of 'immunarch' from a github remote, the SHA1 (37d06bef) has not changed since last install.
  Use `force = TRUE` to force installation

Manually removing it

rm -r /Users/mdozmorov/Library/R/x86_64/4.1/library/immunarch

and reinstalling still results in the same error.

MVolobueva commented 2 years ago

Hello, @mdozmorov!

I suppose that error persits as you try to load all files from your folder in Immunarch. But Immunarch could load only files with proper format. Files that names end with contig_annotations.csv should be loaded correctly.

Please try to replace the file_path variable in your script:

file_path = "/Users/mdozmorov/Documents/Data/VCU_work/test_immunarch/data/vdj_v1_mm_c57bl6_pbmc_t_all_contig_annotations.csv"

Do not hesitate to contact us with any questions further along.

Good luck, Maria

shanshenbing commented 2 years ago

Hello, I get the same error when loading my 10x genomics results. I installed your dev version package and restart my Rstudio and get same error. immdata <- repLoad(.path = './BM01/tcr/run_count/outs/all_contig_annotations.csv')

my error is like this:

== Step 1/3: loading repertoire files... ==

Processing "" ... -- [1/1] Parsing "/BM01/tcr/run_count/outs/all_contig_annotations.csv" -- 10x (filt.contigs) [!] Removed 1415 clonotypes with no nucleotide and amino acid CDR3 sequence.
Error in step_subset(parent, vars = vars, groups = groups, arrange = arrange, : is.null(j) || is_expression(j) is not TRUE In addition: Warning message: The following named parsers don't match the column names: barcode,is_cell,contig_id,high_confidence,length,chain,v_gene,d_gene,j_gene,c_gene,full_length,productive,fwr1,fwr1_nt,cdr1,cdr1_nt,fwr2,fwr2_nt,cdr2,cdr2_nt,fwr3,fwr3_nt,cdr3,cdr3_nt,fwr4,fwr4_nt,reads,umis,raw_clonotype_id,raw_consensus_id,exact_subclonotype_id

version

you can see I am using the latest version.

packageVersion('immunarch') [1] ‘0.6.8’

session info R version 4.0.5 (2021-03-31) Platform: x86_64-pc-linux-gnu (64-bit) Running under: CentOS Linux 7 (Core)

Matrix products: default BLAS/LAPACK: /usr/lib64/libopenblas-r0.3.3.so

locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
[3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=en_US.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages: [1] stats graphics grDevices utils datasets methods base

other attached packages: [1] immunarch_0.6.8 patchwork_1.1.1 data.table_1.14.0 dtplyr_1.1.0
[5] dplyr_1.0.8 ggplot2_3.3.3

loaded via a namespace (and not attached): [1] rappdirs_0.3.3 prabclus_2.3-2
[3] R.methodsS3_1.8.1 tidyr_1.1.3
[5] bit64_4.0.5 knitr_1.33
[7] DelayedArray_0.16.3 R.utils_2.10.1
[9] RCurl_1.98-1.5 doParallel_1.0.16
[11] generics_0.1.0 BiocGenerics_0.36.1
[13] callr_3.7.0 usethis_2.0.1
[15] RSQLite_2.2.7 shadowtext_0.0.8
[17] rlist_0.4.6.2 tzdb_0.2.0
[19] bit_4.0.4 enrichplot_1.15.3
[21] xml2_1.3.2 httpuv_1.6.1
[23] SummarizedExperiment_1.20.0 assertthat_0.2.1
[25] viridis_0.6.2 xfun_0.23
[27] hms_1.0.0 celldex_1.0.0
[29] babelgene_21.4 evaluate_0.14
[31] promises_1.2.0.1 DEoptimR_1.0-8
[33] fansi_0.4.2 dbplyr_2.1.1
[35] readxl_1.3.1 igraph_1.2.11
[37] DBI_1.1.1 geneplotter_1.68.0
[39] htmlwidgets_1.5.3 stringdist_0.9.6.3
[41] stats4_4.0.5 purrr_0.3.4
[43] ellipsis_0.3.2 ggpubr_0.4.0
[45] backports_1.2.1 annotate_1.68.0
[47] sparseMatrixStats_1.2.1 MatrixGenerics_1.2.1
[49] ggalluvial_0.12.3 vctrs_0.3.8
[51] Biobase_2.50.0 remotes_2.3.0
[53] Cairo_1.5-12.2 abind_1.4-5
[55] cachem_1.0.5 withr_2.4.2
[57] ggforce_0.3.3 robustbase_0.93-7
[59] vroom_1.5.6 treeio_1.14.4
[61] prettyunits_1.1.1 mclust_5.4.9
[63] cluster_2.1.2 DOSE_3.16.0
[65] ExperimentHub_1.16.1 ape_5.5
[67] lazyeval_0.2.2 crayon_1.4.1
[69] genefilter_1.72.1 pkgconfig_2.0.3
[71] tweenr_1.0.2 GenomeInfoDb_1.26.7
[73] nlme_3.1-152 pkgload_1.2.4
[75] nnet_7.3-16 devtools_2.4.3
[77] diptest_0.76-0 rlang_1.0.1
[79] lifecycle_1.0.1 downloader_0.4
[81] BiocFileCache_1.14.0 AnnotationHub_2.22.1
[83] cellranger_1.1.0 rprojroot_2.0.2
[85] polyclip_1.10-0 matrixStats_0.61.0
[87] flextable_0.6.5 phangorn_2.7.1
[89] ggseqlogo_0.1 Matrix_1.3-3
[91] aplot_0.0.6 carData_3.0-4
[93] base64enc_0.1-3 GlobalOptions_0.1.2
[95] processx_3.5.2 pheatmap_1.0.12
[97] png_0.1-7 viridisLite_0.4.0
[99] rjson_0.2.20 bitops_1.0-7
[101] R.oo_1.24.0 blob_1.2.1
[103] DelayedMatrixStats_1.12.3 shape_1.4.6
[105] stringr_1.4.0 qvalue_2.22.0
[107] readr_2.1.2 rstatix_0.7.0
[109] gridGraphics_0.5-1 ggsignif_0.6.1
[111] S4Vectors_0.28.1 scales_1.1.1
[113] memoise_2.0.0 magrittr_2.0.1
[115] plyr_1.8.6 zlibbioc_1.36.0
[117] compiler_4.0.5 scatterpie_0.1.6
[119] factoextra_1.0.7 RColorBrewer_1.1-2
[121] clue_0.3-60 DESeq2_1.30.1
[123] cli_3.2.0 XVector_0.30.0
[125] ps_1.6.0 MASS_7.3-54
[127] tidyselect_1.1.1 forcats_0.5.1
[129] stringi_1.7.6 yaml_2.2.1
[131] GOSemSim_2.16.1 locfit_1.5-9.4
[133] ggrepel_0.9.1 grid_4.0.5
[135] fastmatch_1.1-0 tools_4.0.5
[137] rio_0.5.26 parallel_4.0.5
[139] rvg_0.2.5 circlize_0.4.13
[141] rstudioapi_0.13 uuid_0.1-4
[143] foreign_0.8-81 foreach_1.5.1
[145] gridExtra_2.3 devEMF_4.0-2
[147] farver_2.1.0 ggraph_2.0.5
[149] digest_0.6.27 rvcheck_0.1.8
[151] BiocManager_1.30.16 shiny_1.6.0
[153] quadprog_1.5-8 fpc_2.2-9
[155] Rcpp_1.0.6 car_3.0-10
[157] GenomicRanges_1.42.0 broom_0.7.6
[159] BiocVersion_3.12.0 R.devices_2.17.0
[161] later_1.2.0 httr_1.4.2
[163] gdtools_0.2.3 AnnotationDbi_1.52.0
[165] ComplexHeatmap_2.6.2 kernlab_0.9-29
[167] colorspace_2.0-1 job_0.3.0
[169] XML_3.99-0.9 fs_1.5.0
[171] IRanges_2.24.1 splines_4.0.5
[173] yulab.utils_0.0.4 tidytree_0.3.4
[175] graphlayouts_0.7.1 shinythemes_1.2.0
[177] flexmix_2.3-17 ggplotify_0.0.7
[179] plotly_4.9.3 sessioninfo_1.1.1
[181] systemfonts_1.0.2 xtable_1.8-4
[183] jsonlite_1.7.2 ggtree_2.4.2
[185] tidygraph_1.2.0 UpSetR_1.4.0
[187] modeltools_0.2-23 testthat_3.0.2
[189] R6_2.5.0 pillar_1.6.1
[191] htmltools_0.5.2 mime_0.10
[193] glue_1.6.0 fastmap_1.1.0
[195] clusterProfiler_3.18.1 BiocParallel_1.24.1
[197] class_7.3-19 interactiveDisplayBase_1.28.0 [199] codetools_0.2-18 fgsea_1.16.0
[201] pkgbuild_1.2.0 utf8_1.2.1
[203] lattice_0.20-44 tibble_3.1.2
[205] curl_4.3.1 officer_0.3.18
[207] magick_2.7.2 openxlsx_4.2.3
[209] zip_2.1.1 GO.db_3.12.1
[211] survival_3.2-11 rmarkdown_2.8
[213] desc_1.3.0 munsell_0.5.0
[215] DO.db_2.9 GetoptLong_1.0.5
[217] GenomeInfoDbData_1.2.4 iterators_1.0.13
[219] haven_2.4.1 reshape2_1.4.4
[221] gtable_0.3.0 msigdbr_7.4.1
[223] eoffice_0.2.1

Then I tested several versions of the package and only versions before 0.6.5 can load 10x data correctly. That means 0.6.4 can load 10x genomics data, but 0.6.5 0.6.7 not.

Hope you can give me some suggestions. Thank you!

MVolobueva commented 2 years ago

Hi, @shanshenbing!

Thank you for contacting us. I suppose that error persits as package versions conflict in Rstudio. To test it, write on the command line:

R --vanilla

Than install proper version of immunarch again and repeat your command (on the command line too). If everything will be ok, just update Rstudio projects, otherwise let us know.

Do not hesitate to contact us with any questions further along.

Good luck, Maria Samokhina