Not finding TransferAnchors between integrated RNA object & merged ATAC object

hchintalapudi commented 3 years ago

Hi, I have ScRNA and ScATAC data from same experiment (not multiomic) and I followed the merge vignette to merge the 5 samples (2 groups) based on intersecting peaks as shown in the vignette. Coming to the ScRNA data, I have similar 5 samples and I SCtransformed them, integrated them and then annotated the clusters. I'm looking to integrate both the integrated RNA object and merged ATAC object and co-embed them as shown here: https://satijalab.org/seurat/articles/atacseq_integration_vignette.html#co-embedding-scrna-seq-and-scatac-seq-datasets

However, I get an error:

transfer.anchors <- FindTransferAnchors(
   reference = all.integrated,
   reference.assay = "RNA",
   query.assay = "RNA",
   query = combined.2,
   reduction = 'cca')

Error: No features to use in finding transfer anchors. To troubleshoot, try explicitly providing features to the features parameter and ensure that they are present in both reference and query assays.

Here's my merge code for ATAC data:

### Creating a common peak set by merging:

library(future)
plan("multiprocess", workers = 4)
options(future.globals.maxSize = 50000 * 1024^2) # for 50 Gb RAM

peaks.Ptf1a_1<- read.table(
  file = "/Users/hchintalapudi/Desktop/work/ScATAC/10x_ATAC_9-30-30/Ptf1a_1/peaks.bed",
  col.names = c("chr", "start", "end")
)
peaks.Ptf1a_2<- read.table(
  file = "/Users/hchintalapudi/Desktop/work/ScATAC/10x_ATAC_9-30-30/Ptf1a_2/peaks.bed",
  col.names = c("chr", "start", "end")
)
peaks.Ptf1a_3<- read.table(
  file = "/Users/hchintalapudi/Desktop/work/ScATAC/10x_ATAC_9-30-30/Ptf1a_3/peaks.bed",
  col.names = c("chr", "start", "end")
)
peaks.Sox9b_1<- read.table(
  file = "/Users/hchintalapudi/Desktop/work/ScATAC/10x_ATAC_9-30-30/Sox9b_1/peaks.bed",
  col.names = c("chr", "start", "end")
)
peaks.Sox9b_2<- read.table(
  file = "/Users/hchintalapudi/Desktop/work/ScATAC/10x_ATAC_9-30-30/Sox9b_2/peaks.bed",
  col.names = c("chr", "start", "end")
)
peaks.Sox9b_3<- read.table(
  file = "/Users/hchintalapudi/Desktop/work/ScATAC/10x_ATAC_9-30-30/Sox9b_3/peaks.bed",
  col.names = c("chr", "start", "end")
)

# convert to genomic ranges
gr.Ptf1a_1 <- makeGRangesFromDataFrame(peaks.Ptf1a_1)
gr.Ptf1a_2 <- makeGRangesFromDataFrame(peaks.Ptf1a_2)
gr.Ptf1a_3 <- makeGRangesFromDataFrame(peaks.Ptf1a_3)
gr.Sox9b_1 <- makeGRangesFromDataFrame(peaks.Sox9b_1)
gr.Sox9b_2 <- makeGRangesFromDataFrame(peaks.Sox9b_2)
gr.Sox9b_3 <- makeGRangesFromDataFrame(peaks.Sox9b_3)

# Create a unified set of peaks to quantify in each dataset
combined.peaks <- GenomicRanges::reduce(x = c(gr.Ptf1a_1, gr.Ptf1a_2, gr.Ptf1a_3, gr.Sox9b_1, gr.Sox9b_2))
#combined.peaks.1 <- reduce(x = c(gr.Ptf1a_1, gr.Ptf1a_2, gr.Ptf1a_3))
#combined.peaks.2 <- reduce(x = c(gr.Sox9b_1, gr.Sox9b_2))

peakwidths <- width(combined.peaks)
combined.peaks <- combined.peaks[peakwidths  < 10000 & peakwidths > 20]
combined.peaks

peakwidths.1 <- width(combined.peaks.1)
combined.peaks.1 <- combined.peaks.1[peakwidths.1  < 10000 & peakwidths.1 > 20]
combined.peaks.1

peakwidths.2 <- width(combined.peaks.2)
combined.peaks.2 <- combined.peaks[peakwidths.2  < 10000 & peakwidths.2 > 20]
combined.peaks.2

# load metadata
md.Ptf1a_1<- read.table(
  file = "/Users/hchintalapudi/Desktop/work/ScATAC/10x_ATAC_9-30-30/Ptf1a_1/singlecell.csv",
  stringsAsFactors = FALSE,
  sep = ",",
  header = TRUE,
  row.names = 1
)[-1, ] # remove the first row

md.Ptf1a_2<- read.table(
  file = "/Users/hchintalapudi/Desktop/work/ScATAC/10x_ATAC_9-30-30/Ptf1a_2/singlecell.csv",
  stringsAsFactors = FALSE,
  sep = ",",
  header = TRUE,
  row.names = 1
)[-1, ]

md.Ptf1a_3<- read.table(
  file = "/Users/hchintalapudi/Desktop/work/ScATAC/10x_ATAC_9-30-30/Ptf1a_3/singlecell.csv",
  stringsAsFactors = FALSE,
  sep = ",",
  header = TRUE,
  row.names = 1
)[-1, ]

md.Sox9b_1<- read.table(
  file = "/Users/hchintalapudi/Desktop/work/ScATAC/10x_ATAC_9-30-30/Sox9b_1/singlecell.csv",
  stringsAsFactors = FALSE,
  sep = ",",
  header = TRUE,
  row.names = 1
)[-1, ]

md.Sox9b_2<- read.table(
  file = "/Users/hchintalapudi/Desktop/work/ScATAC/10x_ATAC_9-30-30/Sox9b_2/singlecell.csv",
  stringsAsFactors = FALSE,
  sep = ",",
  header = TRUE,
  row.names = 1
)[-1, ]

md.Sox9b_3<- read.table(
  file = "/Users/hchintalapudi/Desktop/work/ScATAC/10x_ATAC_9-30-30/Sox9b_3/singlecell.csv",
  stringsAsFactors = FALSE,
  sep = ",",
  header = TRUE,
  row.names = 1
)[-1, ]

# perform an initial filtering of low count cells
md.Ptf1a_1 <- md.Ptf1a_1[md.Ptf1a_1$passed_filters > 500, ]
md.Ptf1a_2 <- md.Ptf1a_2[md.Ptf1a_2$passed_filters > 500, ]
md.Ptf1a_3<- md.Ptf1a_3[md.Ptf1a_3$passed_filters > 500, ]
md.Sox9b_1<- md.Sox9b_1[md.Sox9b_1$passed_filters > 500, ]
md.Sox9b_2<- md.Sox9b_2[md.Sox9b_2$passed_filters > 500, ]
md.Sox9b_3<- md.Sox9b_3[md.Sox9b_3$passed_filters > 500, ]

# create fragment objects
frags.Ptf1a_1 <- CreateFragmentObject(
  path = "/Users/hchintalapudi/Desktop/work/ScATAC/10x_ATAC_9-30-30/Ptf1a_1/fragments.tsv.gz",
  cells = rownames(md.Ptf1a_1)
)
frags.Ptf1a_2 <- CreateFragmentObject(
  path = "/Users/hchintalapudi/Desktop/work/ScATAC/10x_ATAC_9-30-30/Ptf1a_2/fragments.tsv.gz",
  cells = rownames(md.Ptf1a_2)
)
frags.Ptf1a_3 <- CreateFragmentObject(
  path = "/Users/hchintalapudi/Desktop/work/ScATAC/10x_ATAC_9-30-30/Ptf1a_3/fragments.tsv.gz",
  cells = rownames(md.Ptf1a_3)
)

frags.Sox9b_1 <- CreateFragmentObject(
  path = "/Users/hchintalapudi/Desktop/work/ScATAC/10x_ATAC_9-30-30/Sox9b_1/fragments.tsv.gz",
  cells = rownames(md.Sox9b_1)
)
frags.Sox9b_2 <- CreateFragmentObject(
  path = "/Users/hchintalapudi/Desktop/work/ScATAC/10x_ATAC_9-30-30/Sox9b_2/fragments.tsv.gz",
  cells = rownames(md.Sox9b_2)
)
frags.Sox9b_3 <- CreateFragmentObject(
  path = "/Users/hchintalapudi/Desktop/work/ScATAC/10x_ATAC_9-30-30/Sox9b_3/fragments.tsv.gz",
  cells = rownames(md.Sox9b_3)
)

Ptf1a_1.counts <- FeatureMatrix(
  fragments = frags.Ptf1a_1,
  features = combined.peaks,
  cells = rownames(md.Ptf1a_1)
)

Ptf1a_2.counts <- FeatureMatrix(
  fragments = frags.Ptf1a_2,
  features = combined.peaks,
  cells = rownames(md.Ptf1a_2)
)
Ptf1a_3.counts <- FeatureMatrix(
  fragments = frags.Ptf1a_3,
  features = combined.peaks,
  cells = rownames(md.Ptf1a_3)
)

Sox9b_1.counts <- FeatureMatrix(
  fragments = frags.Sox9b_1,
  features = combined.peaks,
  cells = rownames(md.Sox9b_1)
)

Sox9b_2.counts <- FeatureMatrix(
  fragments = frags.Sox9b_2,
  features = combined.peaks,
  cells = rownames(md.Sox9b_2)
)

Sox9b_3.counts <- FeatureMatrix(
  fragments = frags.Sox9b_3,
  features = combined.peaks,
  cells = rownames(md.Sox9b_3)
)

Ptf1a_1_assay <- CreateChromatinAssay(Ptf1a_1.counts, fragments = frags.Ptf1a_1)
Ptf1a_1_atac <- CreateSeuratObject(Ptf1a_1_assay, assay = "ATAC", meta.data = md.Ptf1a_1)

Ptf1a_2_assay <- CreateChromatinAssay(Ptf1a_2.counts, fragments = frags.Ptf1a_2)
Ptf1a_2_atac <- CreateSeuratObject(Ptf1a_2_assay, assay = "ATAC", meta.data = md.Ptf1a_2)

Ptf1a_3_assay <- CreateChromatinAssay(Ptf1a_3.counts, fragments = frags.Ptf1a_3)
Ptf1a_3_atac <- CreateSeuratObject(Ptf1a_3_assay, assay = "ATAC", meta.data = md.Ptf1a_3)

Sox9b_1_assay <- CreateChromatinAssay(Sox9b_1.counts, fragments = frags.Sox9b_1)
Sox9b_1_atac <- CreateSeuratObject(Sox9b_1_assay, assay = "ATAC", meta.data = md.Sox9b_1)

Sox9b_2_assay <- CreateChromatinAssay(Sox9b_2.counts, fragments = frags.Sox9b_2)
Sox9b_2_atac <- CreateSeuratObject(Sox9b_2_assay, assay = "ATAC", meta.data = md.Sox9b_2)

Sox9b_3_assay <- CreateChromatinAssay(Sox9b_3.counts, fragments = frags.Sox9b_3)
Sox9b_3_atac <- CreateSeuratObject(Sox9b_3_assay, assay = "ATAC", meta.data = md.Sox9b_3)

Ptf1a_1_atac$dataset<- "Ptf1a_1"
Ptf1a_1_atac$group<- "Ptf1a"

Ptf1a_2_atac$dataset<- "Ptf1a_2"
Ptf1a_2_atac$group<- "Ptf1a"

Ptf1a_3_atac$dataset<- "Ptf1a_3"
Ptf1a_3_atac$group<- "Ptf1a"

Sox9b_1_atac$dataset<- "Sox9b_1"
Sox9b_1_atac$group<- "Sox9b"

Sox9b_2_atac$dataset<- "Sox9b_2"
Sox9b_2_atac$group<- "Sox9b"

Sox9b_3_atac$dataset<- "Sox9b_3"
Sox9b_3_atac$group<- "Sox9b"

# merge all datasets, adding a cell ID to make sure cell names are unique
combined.1 <- merge(
  x = Ptf1a_1_atac,
  y = list(Ptf1a_2_atac, Ptf1a_3_atac, Sox9b_1_atac, Sox9b_2_atac),
  add.cell.ids = c("Ptf1a_1", "Ptf1a_2", "Ptf1a_3", "Sox9b_1","Sox9b_2")
)
combined.1[["ATAC"]]

# compute nucleosome signal score per cell
combined.1 <- NucleosomeSignal(object = combined.1)

# compute TSS enrichment score per cell
combined.1 <- TSSEnrichment(object = combined.1, fast = FALSE, assay = 'ATAC', verbose = T)

# add blacklist ratio and fraction of reads in peaks
combined.1$pct_reads_in_peaks <- combined.1$peak_region_fragments / combined.1$passed_filters * 100
combined.1$blacklist_ratio <- combined.1$blacklist_region_fragments / combined.1$peak_region_fragments

combined.1$high.tss <- ifelse(combined.1$TSS.enrichment > 2, 'High', 'Low')
TSSPlot(combined.1, group.by = 'high.tss') + NoLegend()

##Fragment length periodicity:
combined.1$nucleosome_group <- ifelse(combined.1$nucleosome_signal > 4, 'NS > 4', 'NS < 4')
FragmentHistogram(object = combined.1, group.by = 'nucleosome_group')

VlnPlot(
  object = combined.1,
  features = c('pct_reads_in_peaks', 'peak_region_fragments',
               'TSS.enrichment', 'blacklist_ratio', 'nucleosome_signal'),
  pt.size = 0.1,
  ncol = 5
)
## remove outlier cells based on QC metrics:
combined.2 <- subset(
  x = combined.1,
  subset = peak_region_fragments > 2000 &
    peak_region_fragments < 30000 &
    pct_reads_in_peaks > 15 &
    nucleosome_signal < 3 &
    TSS.enrichment > 2
)
combined.2

combined.2 <- RunTFIDF(combined.2)
combined.2 <- FindTopFeatures(combined.2, min.cutoff = 20)
combined.2 <- RunSVD(combined.2)
combined.2 <- RunUMAP(combined.2, dims = 2:50, reduction = 'lsi')

DimPlot(combined.2, group.by = "group", pt.size = 0.1)

#########add gene annotations#######
ah <- AnnotationHub()
query(ah, "EnsDb")
ahDb <- query(ah, pattern = c("Danio Rerio", "EnsDb", 101))#91 for zv10, 101 for zv11
ahEdb <- ahDb[[1]]
gene.ranges <- genes(ahEdb)
annotations <- GetGRangesFromEnsDb(ensdb = ahEdb)
table(annotations$gene_biotype)
##!!! ALWAYS check fragment file to verify genome build (UCSC/NCBI/..) and then change seqlevelsStyle accordingly !!!
###seqlevelsStyle(annotations) <- 'UCSC'
genome(annotations) <- "danRer11"

Annotation(combined.2)<- annotations

# quantify gene activity
#gene.activities <- GeneActivity(combined.2, features = VariableFeatures(all.integrated))
For the above line, I get this error: (all.integrated is my RNA integrated seurat obj)

Extracting gene coordinates
Error in SingleFeatureMatrix(fragment = fragments[[x]], features = features,  : 
  No matching chromosomes found in fragment file.

gene.activities <- GeneActivity(combined.2)
combined.2[["RNA"]]<- CreateAssayObject(counts = gene.activities)

#combined.2<-NormalizeData(
object = combined.2,
assay = 'RNA',
normalization.method = 'LogNormalize',
scale.factor = median(combined.2$nCount_RNA)
)

combined.2<-  NormalizeData(combined.2)
combined.2<- ScaleData(combined.2, features = rownames(combined.2))

## Transfer anchors to identify shared correlations in gene activity:
 ## change default assay of rna object to "RNA"?

 transfer.anchors <- FindTransferAnchors(
  reference = all.integrated,
  reference.assay = "SCT",
  query.assay = "RNA",
  query = combined.2,
  reduction = 'cca'
)

Some additional & relevant info:

head(annotations)
GRanges object with 6 ranges and 5 metadata columns:
                     seqnames      ranges strand |              tx_id      gene_name            gene_id   gene_biotype     type
                        <Rle>   <IRanges>  <Rle> |        <character>    <character>        <character>    <character> <factor>
  ENSDARE00001225435        9 10267-10406      - | ENSDART00000164763 CABZ01078737.1 ENSDARG00000100461 protein_coding     exon
  ENSDARE00001210369        9 11746-11879      - | ENSDART00000164763 CABZ01078737.1 ENSDARG00000100461 protein_coding     exon
  ENSDARE00001179103        9 12748-12883      - | ENSDART00000164763 CABZ01078737.1 ENSDARG00000100461 protein_coding     exon
  ENSDARE00001199325        9 13247-13330      - | ENSDART00000164763 CABZ01078737.1 ENSDARG00000100461 protein_coding     exon
  ENSDARE00001289732        9 13580-13615      - | ENSDART00000164763 CABZ01078737.1 ENSDARG00000100461 protein_coding     exon
  ENSDARE00001203275        9 13867-14044      - | ENSDART00000164763 CABZ01078737.1 ENSDARG00000100461 protein_coding     exon

head(Annotation(combined.2))
GRanges object with 6 ranges and 5 metadata columns:
                     seqnames      ranges strand |              tx_id      gene_name            gene_id   gene_biotype     type
                        <Rle>   <IRanges>  <Rle> |        <character>    <character>        <character>    <character> <factor>
  ENSDARE00001225435        9 10267-10406      - | ENSDART00000164763 CABZ01078737.1 ENSDARG00000100461 protein_coding     exon
  ENSDARE00001210369        9 11746-11879      - | ENSDART00000164763 CABZ01078737.1 ENSDARG00000100461 protein_coding     exon
  ENSDARE00001179103        9 12748-12883      - | ENSDART00000164763 CABZ01078737.1 ENSDARG00000100461 protein_coding     exon
  ENSDARE00001199325        9 13247-13330      - | ENSDART00000164763 CABZ01078737.1 ENSDARG00000100461 protein_coding     exon
  ENSDARE00001289732        9 13580-13615      - | ENSDART00000164763 CABZ01078737.1 ENSDARG00000100461 protein_coding     exon
  ENSDARE00001203275        9 13867-14044      - | ENSDART00000164763 CABZ01078737.1 ENSDARG00000100461 protein_coding     exon

One of the fragment files:

head fragments.tsv_copy 
1   113 327 ATGTCTTTCGTCCCTA-1  3
1   113 481 TCTCAGCGTGGCCTTG-1  1
1   142 272 GTAGACTGTCTCAAAC-1  2
1   208 240 GGTCATAAGGATGTCG-1  1
1   338 660 CCAATGAAGGTGTTGG-1  1
1   348 484 AAATGCCTCGCGCCAA-1  1
1   583 662 TAAGCCAAGGACTTTC-1  1
1   619 792 AGTCAACTCAGAGTGG-1  2
1   654 819 GTGACATAGGGTCCCT-1  1
1   660 801 AGGCGTCGTATATGGA-1  1

The chromosome names match in both, also tried changing seqlevelsStyle(annotations) to 'NCBI' but I don't see any change. Some tips would be appreciated!

my code for RNA integration:

# Load and prepare data,perform QC and run SCtransform individually on all samples before before integration:
rna_10x<- Read10X(data.dir = "/Users/hchintalapudi/Desktop/work/ScRNA/10x_GEX_10-6-20/Ptf1a_1/outs/filtered_feature_bc_matrix/")

rna<- CreateSeuratObject(counts = rna_10x, project = "Ptf1a_1", min.cells = 3, min.features = 200)
rna$group<- "Ptf1a"
rna

rna[["percent.mt"]] <- PercentageFeatureSet(rna, pattern = "^mt-")

# Visualize QC metrics as a violin plot
#VlnPlot(rna, features = c("nFeature_RNA", "nCount_RNA", "percent.mt"), ncol = 3)

# FeatureScatter is typically used to visualize feature-feature relationships, but can be used
# for anything calculated by the object, i.e. columns in object metadata, PC scores etc.

#plot1 <- FeatureScatter(rna, feature1 = "nCount_RNA", feature2 = "percent.mt")
#plot2 <- FeatureScatter(rna, feature1 = "nCount_RNA", feature2 = "nFeature_RNA")
#plot1 + plot2

rna <- subset(rna, subset = nFeature_RNA > 200 & nFeature_RNA < 2500 & percent.mt < 5& nCount_RNA<20000)
#FeatureScatter(rna, feature1 = "nCount_RNA", feature2 = "nFeature_RNA")

# run sctransform
rna <- SCTransform(rna, vars.to.regress = "percent.mt", verbose = FALSE)

####---------------------------------------------------------------------------------------------------
Ptf1a_2_10x<- Read10X(data.dir = "/Users/hchintalapudi/Desktop/work/ScRNA/10x_GEX_10-6-20/Ptf1a_2/outs/filtered_feature_bc_matrix/")
Ptf1a_2.rna<- CreateSeuratObject(counts = Ptf1a_2_10x, project = "Ptf1a_2", min.cells = 3, min.features = 200)
Ptf1a_2.rna$group<- "Ptf1a"
Ptf1a_2.rna
Ptf1a_2.rna[["percent.mt"]] <- PercentageFeatureSet(Ptf1a_2.rna, pattern = "^mt-")

# Show QC metrics for the first 5 cells
head(Ptf1a_2.rna@meta.data, 5)

Ptf1a_2.rna<- subset(Ptf1a_2.rna, subset = nFeature_RNA > 200 & nFeature_RNA < 2500& nCount_RNA<20000 & percent.mt<5)
#Ptf1a_2.rna <- subset(Ptf1a_2.rna, subset = nFeature_RNA > 200 & nFeature_RNA < 4500 & percent.mt < 10)
#FeatureScatter(Ptf1a_2.rna, feature1 = "nCount_RNA", feature2 = "nFeature_RNA")

Ptf1a_2.rna <- SCTransform(Ptf1a_2.rna, vars.to.regress = "percent.mt", verbose = FALSE)

####---------------------------------------------------------------------------------------------------
Ptf1a_3_10x<- Read10X(data.dir = "/Users/hchintalapudi/Desktop/work/ScRNA/10x_GEX_10-6-20/Ptf1a_3/outs/filtered_feature_bc_matrix/")
Ptf1a_3.rna<- CreateSeuratObject(counts = Ptf1a_3_10x, project = "Ptf1a_3", min.cells = 3, min.features = 200)
Ptf1a_3.rna$group<- "Ptf1a"

Ptf1a_3.rna[["percent.mt"]] <- PercentageFeatureSet(Ptf1a_3.rna, pattern = "^mt-")

# Show QC metrics for the first 5 cells
head(Ptf1a_3.rna@meta.data, 5)

Ptf1a_3.rna <- subset(Ptf1a_3.rna, subset = nFeature_RNA > 200 & nFeature_RNA < 2500 & percent.mt < 5 &nCount_RNA <20000)
#FeatureScatter(Ptf1a_3.rna, feature1 = "nCount_RNA", feature2 = "nFeature_RNA")

Ptf1a_3.rna <- SCTransform(Ptf1a_3.rna, vars.to.regress = "percent.mt", verbose = FALSE)

####---------------------------------------------------------------------------------------------------
Sox9b_1_10x<- Read10X(data.dir = "/Users/hchintalapudi/Desktop/work/ScRNA/10x_GEX_10-6-20/Sox9b_1/outs/filtered_feature_bc_matrix/")
Sox9b_1.rna<- CreateSeuratObject(counts = Sox9b_1_10x, project = "Sox9b_1", min.cells = 3, min.features = 200)
Sox9b_1.rna$group<- "Sox9b"

Sox9b_1.rna[["percent.mt"]] <- PercentageFeatureSet(Sox9b_1.rna, pattern = "^mt-")

# Show QC metrics for the first 5 cells
head(Sox9b_1.rna@meta.data, 5)

Sox9b_1.rna <- subset(Sox9b_1.rna, subset = nFeature_RNA > 200 & nFeature_RNA < 2500 & percent.mt < 5 & nCount_RNA<20000)
#FeatureScatter(Sox9b_1.rna, feature1 = "nCount_RNA", feature2 = "nFeature_RNA")

Sox9b_1.rna <- SCTransform(Sox9b_1.rna, vars.to.regress = "percent.mt", verbose = FALSE)

####---------------------------------------------------------------------------------------------------
Sox9b_2_10x<- Read10X(data.dir = "/Users/hchintalapudi/Desktop/work/ScRNA/10x_GEX_10-6-20/Sox9b_2/outs/filtered_feature_bc_matrix/")
Sox9b_2.rna<- CreateSeuratObject(counts = Sox9b_2_10x, project = "Sox9b_2", min.cells = 3, min.features = 200)
Sox9b_2.rna$group<- "Sox9b"

Sox9b_2.rna[["percent.mt"]] <- PercentageFeatureSet(Sox9b_2.rna, pattern = "^mt-")

head(Sox9b_2.rna@meta.data, 5)

Sox9b_2.rna <- subset(Sox9b_2.rna, subset = nFeature_RNA > 200 & nFeature_RNA < 2500 & percent.mt < 5 &nCount_RNA <20000)
#FeatureScatter(Sox9b_2.rna, feature1 = "nCount_RNA", feature2 = "nFeature_RNA")

Sox9b_2.rna <- SCTransform(Sox9b_2.rna, vars.to.regress = "percent.mt", verbose = FALSE)

####---------------------------------------------------------------------------------------------------
# 02/06/2021
### Integration of all samples ignoring groups; without Ptf1a_Cont_gRNA and removing Sox9b_3.rna due to poor yield:
options(future.globals.maxSize = 4000 * 1024^2)

list_10x_Scrna<- list(rna, Ptf1a_2.rna, Ptf1a_3.rna, Sox9b_1.rna, Sox9b_2.rna)
all.features<- SelectIntegrationFeatures(object.list = list_10x_Scrna, nfeatures = 3000)
list_10x_Scrna<- PrepSCTIntegration(object.list = list_10x_Scrna, anchor.features = all.features, 
    verbose = T)
all.anchors<- FindIntegrationAnchors(object.list = list_10x_Scrna, normalization.method = "SCT", 
    anchor.features = all.features, verbose = FALSE)
all.integrated<- IntegrateData(anchorset = all.anchors, normalization.method = "SCT", verbose = T)

all.integrated<- RunPCA(all.integrated, verbose = F)
PCAPlot(all.integrated, group.by = "group", pt.size = 0.7) + ggtitle("")
all.integrated<- RunUMAP(all.integrated, dims = 1:30)
all.integrated<- RunTSNE(all.integrated, dims = 1:30)

all.integrated<- FindNeighbors(all.integrated, dims = 1:30, verbose = F)
all.integrated<- FindClusters(all.integrated, resolution = 0.5)
p1<-DimPlot(all.integrated, reduction = "umap", group.by = "orig.ident", pt.size = 0.5)+ ggtitle("")
p2<-DimPlot(all.integrated, reduction = "umap", group.by = "group", pt.size = 0.5) +ggtitle("")
plot_grid(p1, p2)
DimPlot(all.integrated, reduction = "umap", split.by = "group", pt.size = 0.6, label = T, label.size = 7) + NoLegend()

all.integrated.markers<- FindAllMarkers(all.integrated, min.pct = 0.25, logfc.threshold = 0.25, min.diff.pct = 0.25,only.pos = TRUE)

Many thanks for the amazing tools developed.

timoast commented 3 years ago

Your error message is:

Error: No features to use in finding transfer anchors. To troubleshoot, try explicitly providing features to the features parameter and ensure that they are present in both reference and query assays.

Did you check the feature names in the assays and make sure there are common features present?

Also, please format your code in future so that it's readable: https://docs.github.com/en/github/writing-on-github/creating-and-highlighting-code-blocks

hchintalapudi commented 3 years ago

Hi @timoast , Apologies, I was in a hurry. I reformatted everything properly.

I checked like this:

table(Annotation(combined.2)$gene_name %in% rownames(all.integrated))

  FALSE    TRUE 
 212522 1183354

I don't understand why this is giving me an error:

gene.activities <- GeneActivity(combined.2, features = VariableFeatures(all.integrated))
Extracting gene coordinates
Error in SingleFeatureMatrix(fragment = fragments[[x]], features = features,  : 
  No matching chromosomes found in fragment file.

I checked the 'Annotation()' obj and one of the fragment files and the chr names seem to be uniform.

Not using the 'features' argument makes it work and then I get in to the issue of not finding 'TransferAnchors'

timoast commented 3 years ago

table(Annotation(combined.2)$gene_name %in% rownames(all.integrated))

This does not check for common features between the two different assays. The gene annotation information stored in the chromatin assay is a separate thing to the features in the assay itself.

The GeneActivity() function error message should give you a clue here: "No matching chromosomes found in fragment file." This suggests that the chromosome names in the gene annotation you're using are different to the chromosome names in the fragment file. They need to match for GeneActivity() to work properly.

I'd suggest looking at the number of genes in common between the gene.activities matrix that you generate and the all.integrated object.

hchintalapudi commented 3 years ago

Thanks for your comment.

gene.activities <- GeneActivity(combined.2)
table(rownames(gene.activities) %in% rownames(all.integrated))

FALSE  TRUE 
 5513 18959

I see that majority are common.

*Update on this (05/06/21): I've had success in running RNA+ATAC integration in the past, but not on merged/integrated objects, only on single replicate samples. So I tried to see what was different. I notice that my single sample seurat obj shows variable features:

rna
An object of class Seurat 
29701 features across 1184 samples within 2 assays 
Active assay: SCT (13293 features, 3000 variable features)
 1 other assay present: RNA

But my integrated obj doesn't:

all.integrated
An object of class Seurat 
38576 features across 8687 samples within 3 assays 
Active assay: RNA (19363 features, 0 variable features)
 2 other assays present: SCT, integrated
 3 dimensional reductions calculated: pca, umap, tsne

And that's why I get these error messages:

 gene.activities <- GeneActivity(combined.2, features = VariableFeatures(all.integrated))
Extracting gene coordinates
Error in SingleFeatureMatrix(fragment = fragments[[x]], features = features,  : 
  No matching chromosomes found in fragment file.

transfer.anchors <- FindTransferAnchors(
+   reference = all.integrated,
+   reference.assay = 'RNA',
+   query.assay = 'RNA',
+   query = combined.2,
+   reduction = 'cca',
+   features = VariableFeatures(all.integrated)
+ )

Error: No features to use in finding transfer anchors. To troubleshoot, try explicitly providing features to the features parameter and ensure that they are present in both reference and query assays.

So I tried to supply the features to the 'FindTransferAnchors' function like so:

transfer.anchors <- FindTransferAnchors(
+   reference = all.integrated,
+   reference.assay = 'RNA',
+   query.assay = 'RNA',
+   query = combined.2,
+   reduction = 'cca',
+   features = all.integrated@assays$integrated@var.features
+ )

And I had success with integration:

Warning: 9 features of the features specified were not present in both the reference query assays. 
Continuing with remaining 2991 features.
Running CCA
Merging objects
Finding neighborhoods
Finding anchors
    Found 18543 anchors
Filtering anchors
    Retained 1762 anchors

Hi @timoast, Can you tell me why this is happening? Just curious.

Thanks a lot!

hchintalapudi commented 3 years ago

Another question about ScATAC data: My merged object shows clusters that do not overlap at all. The samples of similar groups are more like 'biological replicates' and we did not significant batch effects, so it was concerning to see this. (FYI CellRanger peak calling was done sample wise and not aggregated) I tried tweaking the dimensions but it didn't affect the UMAPs much.

Do you have any opinion on this? Is Integration the way to go here? tmp.1.pdf tmp.2.pdf tmp.3_dims2:5.pdf

Thank you!

timoast commented 3 years ago

FYI CellRanger peak calling was done sample wise and not aggregated

It's essential that a common set of peaks are used when merging or integrating multiple datasets. See the merge vignette for more information. If you quantify the same peak set across the different datasets prior to merging and still observe a batch effect, you could try using integration (vignette here)

stuart-lab / signac

Not finding TransferAnchors between integrated RNA object & merged ATAC object #595