stuart-lab / signac

R toolkit for the analysis of single-cell chromatin data
https://stuartlab.org/signac/
Other
318 stars 86 forks source link

Not all cells requested could be found in the fragment file #1446

Closed rsggsr closed 1 year ago

rsggsr commented 1 year ago

Hi,

Sorry I have to open a new issue even I just had opened a issue (#1441 ) several days ago.

I have downloaded the multiome scRNAseq and scATACseq data recently and try to use the codes below to build a new Seurat object.

data_dir<- ".../expression matrix/" list.files(data_dir) # Should show barcodes.tsv, genes.tsv, and matrix.mtx expression_matrix <- Read10X(data.dir = data_dir)

fragpath <- ".../atac_fragments.tsv.gz"

/*get gene annotations for hg38*/ annotation <- GetGRangesFromEnsDb(ensdb = EnsDb.Hsapiens.v86) seqlevelsStyle(annotation) <- "UCSC"

/*create a Seurat object containing the RNA adata*/ control1 <- CreateSeuratObject(counts = expression_matrix, assay = "RNA")

/*create ATAC assay and add it to the object*/ fragcounts <- CountFragments(fragments = fragpath) atac.cells <- fragcounts[fragcounts$frequency_count > 2000, "CB"]

atac.frags <- CreateFragmentObject(path = fragpath, cells = colnames(control1))

However, the last code above showed this error

Computing hash Checking for 4968 cell barcodes Error in CreateFragmentObject(path = fragpath, cells = colnames(control1)) : Not all cells requested could be found in the fragment file.

It worked only if you use the atac.cells in this code.

atac.frags <- CreateFragmentObject(path = fragpath, cells = atac.cells)

But if you use this worked fragment file to create feature matrix and then use this matrix to create a new chromatin assay attached in the Seurat object, it showed another error that indicates the cell number is not matching.

` / call peaks using MACS2 in Ubuntu WSL/

macs2 callpeak -t /.../atac_fragments.tsv.gz -g 1.87e+09 -f BED --nomodel --extsize 200 --shift -100 -n SeuratProject --outdir /.../results`

df <- read.table( file = ".../results/SeuratProject_peaks.narrowPeak", col.names = c("chr", "start", "end", "name", "score", "strand", "fold_change", "neg_log10pvalue_summit", "neg_log10qvalue_summit", "relative_summit_position") ) peaks <- makeGRangesFromDataFrame(df = df, keep.extra.columns = TRUE)

/*remove peaks on nonstandard chromosomes and in genomic blacklist regions*/ peaks <- keepStandardChromosomes(peaks, pruning.mode = "coarse") peaks <- subsetByOverlaps(x = peaks, ranges = blacklist_hg38, invert = TRUE)

/* quantify counts in each peak*/ macs2_counts <- FeatureMatrix(fragments = atac.frags, features = peaks)

/* create a new assay using the MACS2 peak set and add it to the Seurat object*/ control1[["peaks"]] <- CreateChromatinAssay( counts = macs2_counts, fragments = fragpath, annotation = annotation)

I've already saw there's some discussion #518, #748 #1161, but I still couldn't solve it. Does it look like the cell barcode just don't match to the cell barcode of the fragment object I created using the atac_fragments.tsv.gz. BTW, I also re-generated the index file using tabix command. Greatly appreciated if you could kindly let me know what might be wrong with my approach.

sessionInfo() ` R version 4.2.1 (2022-06-23 ucrt) Platform: x86_64-w64-mingw32/x64 (64-bit) Running under: Windows 10 x64 (build 19044)

Matrix products: default

locale: [1] LC_COLLATE=English_United States.utf8 LC_CTYPE=English_United States.utf8 LC_MONETARY=English_United States.utf8 LC_NUMERIC=C
[5] LC_TIME=English_United States.utf8

attached base packages: [1] stats4 stats graphics grDevices utils datasets methods base

other attached packages: [1] ggplot2_3.4.2 dplyr_1.1.2 SeuratObject_4.1.3 Seurat_4.3.0 Signac_1.9.0
[6] BSgenome.Hsapiens.UCSC.hg38_1.4.4 BSgenome_1.64.0 rtracklayer_1.56.1 Biostrings_2.64.1 XVector_0.36.0
[11] EnsDb.Hsapiens.v86_2.99.0 ensembldb_2.20.2 AnnotationFilter_1.20.0 GenomicFeatures_1.48.4 AnnotationDbi_1.58.0
[16] Biobase_2.56.0 GenomicRanges_1.48.0 GenomeInfoDb_1.37.1 IRanges_2.30.1 S4Vectors_0.34.0
[21] BiocGenerics_0.42.0

loaded via a namespace (and not attached): [1] utf8_1.2.3 spatstat.explore_3.2-1 reticulate_1.28 R.utils_2.12.2 tidyselect_1.2.0 RSQLite_2.3.1
[7] htmlwidgets_1.6.2 docopt_0.7.1 grid_4.2.1 BiocParallel_1.30.4 Rtsne_0.16 munsell_0.5.0
[13] codetools_0.2-18 ica_1.0-3 future_1.32.0 miniUI_0.1.1.1 withr_2.5.0 spatstat.random_3.1-5
[19] colorspace_2.1-0 progressr_0.13.0 filelock_1.0.2 knitr_1.43 rstudioapi_0.14 ROCR_1.0-11
[25] tensor_1.5 listenv_0.9.0 labeling_0.4.2 MatrixGenerics_1.8.1 slam_0.1-50 GenomeInfoDbData_1.2.9
[31] polyclip_1.10-4 farver_2.1.1 bit64_4.0.5 parallelly_1.36.0 vctrs_0.6.2 generics_0.1.3
[37] xfun_0.39 biovizBase_1.44.0 BiocFileCache_2.4.0 R6_2.5.1 hdf5r_1.3.8 bitops_1.0-7
[43] spatstat.utils_3.0-3 cachem_1.0.8 DelayedArray_0.22.0 promises_1.2.0.1 BiocIO_1.6.0 scales_1.2.1
[49] nnet_7.3-17 gtable_0.3.3 globals_0.16.2 goftest_1.2-3 rlang_1.1.1 RcppRoll_0.3.0
[55] splines_4.2.1 lazyeval_0.2.2 dichromat_2.0-0.1 checkmate_2.2.0 spatstat.geom_3.2-1 yaml_2.3.7
[61] reshape2_1.4.4 abind_1.4-5 backports_1.4.1 httpuv_1.6.11 Hmisc_5.1-0 tools_4.2.1
[67] ellipsis_0.3.2 RColorBrewer_1.1-3 ggridges_0.5.4 Rcpp_1.0.10 plyr_1.8.8 base64enc_0.1-3
[73] progress_1.2.2 zlibbioc_1.42.0 purrr_1.0.1 RCurl_1.98-1.12 prettyunits_1.1.1 rpart_4.1.16
[79] deldir_1.0-9 pbapply_1.7-0 cowplot_1.1.1 zoo_1.8-12 SummarizedExperiment_1.26.1 ggrepel_0.9.3
[85] cluster_2.1.4 magrittr_2.0.3 data.table_1.14.8 scattermore_1.1 lmtest_0.9-40 RANN_2.6.1
[91] ProtGenerics_1.28.0 fitdistrplus_1.1-11 matrixStats_0.63.0 evaluate_0.21 hms_1.1.3 patchwork_1.1.2
[97] mime_0.12 xtable_1.8-4 XML_3.99-0.14 sparsesvd_0.2-2 gridExtra_2.3 compiler_4.2.1
[103] biomaRt_2.52.0 tibble_3.2.1 KernSmooth_2.23-20 crayon_1.5.2 R.oo_1.25.0 htmltools_0.5.5
[109] later_1.3.1 Formula_1.2-5 tidyr_1.3.0 DBI_1.1.3 dbplyr_2.3.2 MASS_7.3-60
[115] rappdirs_0.3.3 Matrix_1.5-4.1 cli_3.6.1 R.methodsS3_1.8.2 parallel_4.2.1 igraph_1.4.3
[121] pkgconfig_2.0.3 GenomicAlignments_1.32.1 foreign_0.8-82 sp_1.6-0 plotly_4.10.1 spatstat.sparse_3.0-1
[127] xml2_1.3.4 VariantAnnotation_1.42.1 stringr_1.5.0 digest_0.6.31 sctransform_0.3.5 RcppAnnoy_0.0.20
[133] spatstat.data_3.0-1 rmarkdown_2.21 leiden_0.4.3 fastmatch_1.1-3 htmlTable_2.4.1 uwot_0.1.14
[139] restfulr_0.0.15 curl_5.0.0 shiny_1.7.4 Rsamtools_2.12.0 rjson_0.2.21 lifecycle_1.0.3
[145] nlme_3.1-162 jsonlite_1.8.4 limma_3.52.4 viridisLite_0.4.2 fansi_1.0.4 pillar_1.9.0
[151] lattice_0.20-45 KEGGREST_1.36.3 fastmap_1.1.1 httr_1.4.6 survival_3.5-5 glue_1.6.2
[157] qlcMatrix_0.9.7 png_0.1-8 bit_4.0.5 stringi_1.7.12 blob_1.2.4 memoise_2.0.1
[163] irlba_2.3.5.1 future.apply_1.11.0 `

timoast commented 1 year ago

The link to the data doesn't work so I couldn't check the files. Typically the .mtx file for multiome data contains both the RNA and ATAC matrices combined, you would need to separate them before creating the assays.

You should look at the cell names in the RNA object that you have created and see if those barcodes are present in the fragment file.

rsggsr commented 1 year ago

Hi Tim,

Sorry for the link, it's (https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE212450)!

Maybe I am wrong but I think the .mtx file only contains the RNA matrices and also when the cell barcodes of the RNA objects I created is not present in the fragment file.

timoast commented 1 year ago

I think you need to check that you're looking at the correct fragment files for the expression matrix you have. If you think you have the right files and still see that the cell barcodes don't match, maybe you need to read the methods of the original paper to see if there is some steps you are missing, or ask the original authors.