satijalab / seurat

R toolkit for single cell genomics
http://www.satijalab.org/seurat
Other
2.28k stars 909 forks source link

FindTransferAnchors error: 'VariableFeatures not computed for the SCT assay in object1' CITE >>> ATAC reference mapping #5254

Closed Jacobog02 closed 2 years ago

Jacobog02 commented 2 years ago

Computing transfer anchors between scATAC obj with gene activity scores and CITE-seq dataset with SCT normalized expression values. I expect the transferanchor object to be returned but the function errors about the reference object not having VariableFeatures computed. I have double checked the variable features of the input object are present so I believe the issue is within the function itself.

minimal reproducible code example (be warned,takes over 20min to run) This example follows the Signac tutorial for reference mapping.

##  Libraries
library(EnsDb.Hsapiens.v86)
library(Seurat)
library(Signac)
library(SeuratDisk) ## Needed for CITE reference

## Paths
csv_path = "atac_reference/10k_PBMC_ATAC_nextgem_Chromium_Controller_singlecell.csv"
fragments_path = "atac_reference/10k_PBMC_ATAC_nextgem_Chromium_Controller_fragments.tsv.gz"

h5_path = "atac_reference/10k_PBMC_ATAC_nextgem_Chromium_Controller_filtered_peak_bc_matrix.h5"
proj_name = "ATAC_reference"

cite_rna_refernece_path <- "pbmc_rna-multiome_reference/pbmc_multimodal.h5seurat"

out_dir <- "atac_reference/"

genome <- "hg38"

## Read in cell metadata
metadata <- read.csv(
  file = csv_path,
  header = TRUE,
  row.names = 1
)

counts <- Read10X_h5(h5_path) ## use peak file

chrom_assay <- CreateChromatinAssay(
    counts = counts,
    sep = c(":", "-"),
    genome = genome,
    fragments = fragments_path,
  )

pbmc.atac <- CreateSeuratObject(
    counts = chrom_assay,
    assay = "peaks",
    meta.data = metadata,
    project = proj_name
  )

annotations <- GetGRangesFromEnsDb(ensdb = EnsDb.Hsapiens.v86)

seqlevelsStyle(annotations) <- 'UCSC'

genome(annotations) <- "hg38"

# add the gene information to the object
Annotation(pbmc.atac) <- annotations

pbmc.atac <- subset(x = pbmc.atac,subset = peak_region_fragments > 1000)

#Hao_CITE_reference
### use the load command to get precompute data from publication
rna.cite <- LoadH5Seurat(cite_rna_refernece_path)

## Get genes of interest
genes.use <- VariableFeatures(rna.cite)  

  gene.activities <- GeneActivity(pbmc.atac) # Compute
# add the gene activity matrix to the Seurat object as a new assay and normalize it
  pbmc.atac[['ACTIVITY']] <- CreateAssayObject(counts = gene.activities)
  pbmc.atac <- NormalizeData(
    object = pbmc.atac,
    assay = 'ACTIVITY',
    normalization.method = 'LogNormalize',
    scale.factor = median(pbmc.atac$nCount_ACTIVITY)
  )

#Compute PCA to transfer labels

DefaultAssay(pbmc.atac) <- 'ACTIVITY'
  ref.dims <- length(rna.cite[["pca"]])
  pbmc.atac <- FindVariableFeatures(pbmc.atac)
  pbmc.atac <- NormalizeData(pbmc.atac)
  pbmc.atac <- ScaleData(pbmc.atac)
  pbmc.atac <- RunPCA(pbmc.atac, features = genes.use, verbose = FALSE,npcs = ref.dims) ## Needed to transfer RNA labels.

#Compute LSI to create better representation of ATAC for transfer. 

DefaultAssay(pbmc.atac) <- "peaks"
#VariableFeatures(pbmc.atac) <- names(which(Matrix::rowSums(pbmc.atac) > 100))
#pbmc.atac <- RunLSI(pbmc.atac, n = 50, scale.max = NULL)
pbmc.atac <- RunTFIDF(pbmc.atac)
pbmc.atac <- FindTopFeatures(pbmc.atac, min.cutoff = 'q0') ## VERY IMPORTANT FEATURE!!!!!
pbmc.atac <- RunSVD(pbmc.atac)

transfer.anchors <- FindTransferAnchors(reference = rna.cite, 
                                        query = pbmc.atac, 
                                        features = VariableFeatures(object = rna.cite), 
                                        reference.assay = "SCT",
                                        query.assay = "ACTIVITY",
                                        reduction = "cca",
                                        normalization.method = "SCT")

Specifically the error gives

transfer.anchors <- FindTransferAnchors(reference = rna.cite, 
                                        query = pbmc.atac, 
                                        features = VariableFeatures(object = rna.cite), 
                                        reference.assay = "SCT",
                                        query.assay = "ACTIVITY",
                                        reduction = "cca",
                                        normalization.method = "SCT")

Error in RunCCA.Seurat(object1 = reference, object2 = query, features = features, : VariableFeatures not computed for the SCT assay in object1 4. stop(paste0("VariableFeatures not computed for the ", assay1, " assay in object1")) 3. RunCCA.Seurat(object1 = reference, object2 = query, features = features, num.cc = max(dims), renormalize = FALSE, rescale = FALSE, verbose = verbose) 2. RunCCA(object1 = reference, object2 = query, features = features, num.cc = max(dims), renormalize = FALSE, rescale = FALSE, verbose = verbose) 1. FindTransferAnchors(reference = rna.cite, query = pbmc.atac, features = VariableFeatures(object = rna.cite), reference.assay = "SCT", query.assay = "ACTIVITY", reduction = "cca", normalization.method = "SCT")

Doing some quick debugging for the function I believe this is caused by the reference Seurat object (near line 781 of R/integration.R ) does not capture the input reference VariableFeatures field to pass along to downstream functions. This is interesting as other scRNA-seq references w/o SCT normalization allows the RunCCA function to run.

sessionInfo()
R version 4.1.0 (2021-05-18)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Debian GNU/Linux 10 (buster)

Matrix products: default
BLAS:   /usr/lib/x86_64-linux-gnu/openblas/libblas.so.3
LAPACK: /usr/lib/x86_64-linux-gnu/libopenblasp-r0.3.5.so

locale:
 [1] LC_CTYPE=C.UTF-8       LC_NUMERIC=C           LC_TIME=C.UTF-8        LC_COLLATE=C.UTF-8     LC_MONETARY=C.UTF-8   
 [6] LC_MESSAGES=C.UTF-8    LC_PAPER=C.UTF-8       LC_NAME=C              LC_ADDRESS=C           LC_TELEPHONE=C        
[11] LC_MEASUREMENT=C.UTF-8 LC_IDENTIFICATION=C   

attached base packages:
[1] stats4    parallel  stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] dplyr_1.0.7               ggpubr_0.4.0              ggplot2_3.3.5             SeuratDisk_0.0.0.9019    
 [5] Signac_1.4.0              SeuratObject_4.0.2        Seurat_4.0.5              EnsDb.Hsapiens.v86_2.99.0
 [9] ensembldb_2.18.0          AnnotationFilter_1.18.0   GenomicFeatures_1.46.1    AnnotationDbi_1.56.1     
[13] Biobase_2.54.0            GenomicRanges_1.46.0      GenomeInfoDb_1.30.0       IRanges_2.28.0           
[17] S4Vectors_0.32.0          BiocGenerics_0.40.0      

loaded via a namespace (and not attached):
  [1] rappdirs_0.3.3              SnowballC_0.7.0             rtracklayer_1.54.0          scattermore_0.7            
  [5] tidyr_1.1.4                 bit64_4.0.5                 knitr_1.36                  irlba_2.3.3                
  [9] DelayedArray_0.20.0         data.table_1.14.2           rpart_4.1-15                KEGGREST_1.34.0            
 [13] RCurl_1.98-1.5              generics_0.1.1              cowplot_1.1.1               RSQLite_2.2.8              
 [17] RANN_2.6.1                  future_1.23.0               bit_4.0.4                   spatstat.data_2.1-0        
 [21] xml2_1.3.2                  httpuv_1.6.3                SummarizedExperiment_1.24.0 assertthat_0.2.1           
 [25] xfun_0.27                   hms_1.1.1                   evaluate_0.14               promises_1.2.0.1           
 [29] fansi_0.5.0                 restfulr_0.0.13             progress_1.2.2              dbplyr_2.1.1               
 [33] readxl_1.3.1                igraph_1.2.7                DBI_1.1.1                   htmlwidgets_1.5.4          
 [37] sparsesvd_0.2               spatstat.geom_2.3-0         purrr_0.3.4                 ellipsis_0.3.2             
 [41] RSpectra_0.16-0             backports_1.3.0             biomaRt_2.50.0              deldir_1.0-6               
 [45] MatrixGenerics_1.6.0        vctrs_0.3.8                 ROCR_1.0-11                 abind_1.4-5                
 [49] cachem_1.0.6                withr_2.4.2                 ggforce_0.3.3               BSgenome_1.62.0            
 [53] checkmate_2.0.0             sctransform_0.3.2           GenomicAlignments_1.30.0    prettyunits_1.1.1          
 [57] goftest_1.2-3               cluster_2.1.2               lazyeval_0.2.2              crayon_1.4.2               
 [61] hdf5r_1.3.4                 pkgconfig_2.0.3             slam_0.1-48                 labeling_0.4.2             
 [65] tweenr_1.0.2                nlme_3.1-153                ProtGenerics_1.26.0         nnet_7.3-16                
 [69] rlang_0.4.12                globals_0.14.0              lifecycle_1.0.1             miniUI_0.1.1.1             
 [73] filelock_1.0.2              BiocFileCache_2.2.0         SeuratData_0.2.1            dichromat_2.0-0            
 [77] cellranger_1.1.0            polyclip_1.10-0             matrixStats_0.61.0          lmtest_0.9-38              
 [81] Matrix_1.3-4                ggseqlogo_0.1               carData_3.0-4               zoo_1.8-9                  
 [85] base64enc_0.1-3             ggridges_0.5.3              png_0.1-7                   viridisLite_0.4.0          
 [89] rjson_0.2.20                bitops_1.0-7                KernSmooth_2.23-20          Biostrings_2.62.0          
 [93] blob_1.2.2                  stringr_1.4.0               parallelly_1.28.1           jpeg_0.1-9                 
 [97] rstatix_0.7.0               ggsignif_0.6.3              scales_1.1.1                memoise_2.0.0              
[101] magrittr_2.0.1              plyr_1.8.6                  ica_1.0-2                   zlibbioc_1.40.0            
[105] compiler_4.1.0              tinytex_0.34                BiocIO_1.4.0                RColorBrewer_1.1-2         
[109] fitdistrplus_1.1-6          Rsamtools_2.10.0            cli_3.1.0                   XVector_0.34.0             
[113] listenv_0.8.0               patchwork_1.1.1             pbapply_1.5-0               htmlTable_2.3.0            
[117] Formula_1.2-4               MASS_7.3-54                 mgcv_1.8-38                 tidyselect_1.1.1           
[121] stringi_1.7.5               forcats_0.5.1               yaml_2.2.1                  latticeExtra_0.6-29        
[125] ggrepel_0.9.1               grid_4.1.0                  VariantAnnotation_1.40.0    bmcite.SeuratData_0.3.0    
[129] fastmatch_1.1-3             tools_4.1.0                 future.apply_1.8.1          rio_0.5.27                 
[133] rstudioapi_0.13             foreign_0.8-81              lsa_0.73.2                  gridExtra_2.3              
[137] farver_2.1.0                Rtsne_0.15                  BiocManager_1.30.16         digest_0.6.28              
[141] shiny_1.7.1                 qlcMatrix_0.9.7             Rcpp_1.0.7                  car_3.0-11                 
[145] broom_0.7.10                later_1.3.0                 RcppAnnoy_0.0.19            httr_1.4.2                 
[149] biovizBase_1.42.0           colorspace_2.0-2            XML_3.99-0.8                tensor_1.5                 
[153] reticulate_1.22             splines_4.1.0               uwot_0.1.10                 RcppRoll_0.3.0             
[157] spatstat.utils_2.2-0        plotly_4.10.0               xtable_1.8-4                jsonlite_1.7.2             
[161] R6_2.5.1                    Hmisc_4.6-0                 pillar_1.6.4                htmltools_0.5.2            
[165] mime_0.12                   glue_1.4.2                  fastmap_1.1.0               BiocParallel_1.28.0        
[169] codetools_0.2-18            utf8_1.2.2                  lattice_0.20-45             spatstat.sparse_2.0-0      
[173] tibble_3.1.5                curl_4.3.2                  leiden_0.3.9                zip_2.2.0                  
[177] openxlsx_4.2.4              survival_3.2-13             rmarkdown_2.11              docopt_0.7.1               
[181] munsell_0.5.0               GenomeInfoDbData_1.2.7      haven_2.4.3                 reshape2_1.4.4             
[185] gtable_0.3.0                spatstat.core_2.3-1

Thanks for the help!

timoast commented 2 years ago

Since the scATAC-seq data is not normalized by SCTransform, you can't set normalization.method="SCT" in FindTransferAnchors(). To integrate scRNA-seq and scATAC-seq data, you will need to log-normalize the RNA expression data instead.