stuart-lab / signac

R toolkit for the analysis of single-cell chromatin data
https://stuartlab.org/signac/
Other
327 stars 88 forks source link

integration of scATAC-seq and scRNA-seq #111

Closed ahmedabbas81 closed 4 years ago

ahmedabbas81 commented 4 years ago

I am trying to integrate the gene activity matrix of scATAC-seq with scRNA-seq. The gene activity matrix is not obtained from original scATAC-seq data. Rather, it is obtained from imputed scATAC-seq data by a recent method. In integration it tells me:

"More than 10% of provided features filtered out. Please check that the given features are present in the scale.data slot for both the assays provided here and that they have non-zero variance."Warning message in RunCCA.Seurat(object1 = reference, object2 = query, features = features, : "Fewer than 50 features used as input for CCA."Running CCA

And then the kernel died

I remember in a question before you told me that the features used for integration are the most variable 2000 features from scRNA-seq. So, why do the above problems happen?

Is the scATAC-seq data over imputed and so it was not possible to find the same variable features in the activity matrix?

In addition, when running the TFIDF normalization, it gave me the following messages:

Error in checkSlotAssignment(object, name, value): assignment of an object of class "dgeMatrix" is not valid for slot 'data' in an object of class "Assay"; is(value, "AnyMatrix") is not TRUE Traceback:

  1. RunTFIDF(young_male)
  2. RunTFIDF.Seurat(young_male)
  3. RunTFIDF(object = assay.data, assay = assay, method = method, . scale.factor = scale.factor, verbose = verbose, ...)
  4. RunTFIDF.Assay(object = assay.data, assay = assay, method = method, . scale.factor = scale.factor, verbose = verbose, ...)
  5. SetAssayData(object = object, slot = "data", new.data = new.data)
  6. SetAssayData.Assay(object = object, slot = "data", new.data = new.data)
  7. slot<-(object = *tmp*, name = slot, value = new.data)
  8. checkSlotAssignment(object, name, value)
  9. stop(gettextf("assignment of an object of class %s is not valid for slot %s in an object of class %s; is(value, \"%s\") is not TRUE", . dQuote(valueClass), sQuote(name), dQuote(cl), slotClass), . domain = NA)

However, I skipped this part and proceeded to integration of the two datasets.

Sorry for the very long question and I hope I wrote it clearly.

Thank you

timoast commented 4 years ago

Can you post the full code you'e running?

ahmedabbas81 commented 4 years ago
library(Signac)
library(GenomeInfoDb)
library(EnsDb.Mmusculus.v79)
library(ggplot2)
set.seed(1234)
library(Seurat)
library(magrittr)
library(readr)
library(Matrix)
library(tidyr)
library(dplyr)
#this is the original file
peaks <- Read10X_h5("filtered_peak_bc_matrix_imputed/filtered_peak_bc_matrix.h5")

#this is the imputed data
mex_dir_path <- "filtered_peak_bc_matrix_imputed"

mtx_path <- paste(mex_dir_path, "matrix.mtx", sep = '/')
feature_path <- paste(mex_dir_path, "peaks.bed", sep = '/')
barcode_path <- paste(mex_dir_path, "barcodes.tsv", sep = '/')

features <- readr::read_tsv(feature_path, col_names = F) %>% tidyr::unite(feature)
barcodes <- readr::read_tsv(barcode_path, col_names = F) %>% tidyr::unite(barcode)

mtx <- Matrix::readMM(mtx_path) %>%
  magrittr::set_rownames(features$feature) %>%
  magrittr::set_colnames(barcodes$barcode)

#I need the imputed data only
#if I don't make the next trick I cannot calculate gene activity matrix
peaks[peaks>0] <- 0
peaks <- peaks + mtx
# create a gene activity matrix from the peak matrix and GTF, using chromosomes 1:22, X, and Y.
# Peaks that fall within gene bodies, or 2kb upstream of a gene, are considered
activity.matrix <- CreateGeneActivityMatrix(peak.matrix = peaks, annotation.file = "Mus_musculus.GRCm38.98.gtf", 
                                            seq.levels = c(1:19, "X", "Y"), upstream = 2000, verbose = TRUE)

young_male <- CreateSeuratObject(counts = peaks, assay = "ATAC", project = "10x_ATAC",min.cells = 1,meta.data = metadata)
young_male[["RNA"]] <- CreateAssayObject(counts = activity.matrix)
fragment.path <- '/projects/li-lab/scATAC-seq_scRNA-seq_data/SL19011/cellranger_doublets_removed/fragments.tsv.gz'

young_male <- SetFragments(
  object = young_male,
  file = fragment.path
)
young_male <- NucleosomeSignal(object = young_male)
young_male$pct_reads_in_peaks <- young_male$peak_region_fragments / young_male$passed_filters * 100
young_male$blacklist_ratio <- young_male$blacklist_region_fragments / young_male$peak_region_fragments

VlnPlot(
  object = young_male,
  features = c('pct_reads_in_peaks', 'blacklist_ratio', 'nucleosome_signal', 'peak_region_fragments'),
  pt.size = 0.1,
  ncol = 4) + NoLegend()
young_male$nucleosome_group <- ifelse(young_male$nucleosome_signal > 10, 'NS > 10', 'NS < 10')
PeriodPlot(object = young_male, group.by = 'nucleosome_group', region = 'chr1-1-10000000')

gene.ranges <- genes(EnsDb.Mmusculus.v79)
gene.ranges <- gene.ranges[gene.ranges$gene_biotype == 'protein_coding', ]

tss.ranges <- GRanges(
  seqnames = seqnames(gene.ranges),
  ranges = IRanges(start = start(gene.ranges), width = 2),
  strand = strand(gene.ranges)
)

seqlevelsStyle(tss.ranges) <- 'UCSC'
tss.ranges <- keepStandardChromosomes(tss.ranges, pruning.mode = 'coarse')

# to save time use the first 2000 TSSs
young_male <- TSSEnrichment(object = young_male, tss.positions = tss.ranges[1:2000])

young_male$high.tss <- ifelse(young_male$TSS.enrichment > 2, 'High', 'Low')
TSSPlot(young_male, group.by = 'high.tss') + ggtitle("TSS enrichment score") + NoLegend()

young_male <- subset(young_male, subset = peak_region_fragments > 1000 & peak_region_fragments < 100000 & pct_reads_in_peaks > 15 & blacklist_ratio < 0.025 & nucleosome_signal < 10 & TSS.enrichment > 2)
young_male

##the first problem happens here
young_male <- RunTFIDF(young_male)
young_male <- FindTopFeatures(young_male, min.cutoff = 'q0')

young_male <- RunSVD(
  object = young_male,
  assay = 'ATAC',
  reduction.key = 'LSI_',
  reduction.name = 'lsi'
)

young_male <- RunTSNE(
  object = young_male,
  reduction = 'lsi',
  dims = 1:30
)
young_male <- FindNeighbors(
  object = young_male,
  reduction = 'lsi',
  dims = 1:30
)
young_male <- FindClusters(
  object = young_male,
  verbose = FALSE
)

DimPlot(object = young_male, label = TRUE) + NoLegend()

#if I skip all the previous part from RunTFIDF(young_male) and run the following to integrate

young_male <- NormalizeData(
  object = young_male,
  assay = 'RNA',
  normalization.method = 'LogNormalize',
  scale.factor = median(young_male$nCount_RNA)
)

#load("/projects/abbasa/R_images/19012_image_imputed.RData")
load("/projects/abbasa/R_images/markers_19008.RData")
SCdata <- ScaleData(SCdata)
# We next calculate a subset of features that exhibit high cell-to-cell variation in the dataset 
# (i.e, they are highly expressed in some cells, and lowly expressed in others)
SCdata <- FindVariableFeatures(SCdata, selection.method = "vst", nfeatures = 2000)
SCdata <- RunPCA(SCdata)
SCdata <- RunTSNE(SCdata, dims = 1:10)
SCdata <- FindNeighbors(SCdata, dims = 1:10)

# if resolution increases the number of clusters increase
# clusters can be found using the Idents function
SCdata <- FindClusters(SCdata, resolution = 0.5)

DimPlot(object = SCdata, label = TRUE) + NoLegend()
SCdata <- FindVariableFeatures(SCdata, selection.method = "vst", nfeatures = 2000)
#I face the second problem here
transfer.anchors <- FindTransferAnchors(reference = SCdata, query = young_male,features = VariableFeatures(object = SCdata) , reference.assay = "RNA", query.assay = "RNA", reduction = "cca")
SCdata[["celltype"]] <- singler$singler[[1]]$SingleR.single$labels

celltype.predictions <- TransferData(anchorset = transfer.anchors, refdata = SCdata$celltype, 
                                     weight.reduction = young_male[["lsi"]])
young_male <- AddMetaData(young_male, metadata = celltype.predictions)
hist(young_male$prediction.score.max)
abline(v = 0.5, col = "red")                                     

table(young_male$prediction.score.max > 0.5)  
young_male_filtered <- subset(young_male, subset = prediction.score.max > 0.5)
unique_cells <- unique(young_male_filtered@meta.data$predicted.id)
Temp <- SCdata
Temp <- subset(Temp, celltype %in% unique_cells)
DimPlot(young_male_filtered, group.by = "predicted.id", label = TRUE, repel = TRUE) + ggtitle("scATAC-seq cells") + 
  NoLegend() + scale_colour_hue(drop = FALSE)
DimPlot(Temp, group.by = "celltype", label = TRUE, repel = TRUE) + ggtitle("scRNA-seq cells") + 
  NoLegend()
A <- as.matrix(young_male@assays$RNA@data)
dim(A)
r <- rownames(A)
write.table(r, file="/projects/abbasa/Rdata/19011_SCALE_genes.csv", sep="\n", col.names = F, row.names = F)
write.table(A, file="/projects/abbasa/Rdata/19011_activity_matrix_numbers_SCALE.csv", sep=",", col.names = F, row.names = F)
write.csv(celltype.predictions,"/projects/abbasa/Rdata/19011_narrow_predictions_SCALE.csv")
timoast commented 4 years ago

I think these issues are due to this step:

peaks[peaks>0] <- 0
peaks <- peaks + mtx

What is the motivation for doing that?

One solution is not to use the imputed data.

ahmedabbas81 commented 4 years ago

Thanks Tim for your efforts.

If I do not use this step and directly use mtx, I cannot get the gene activity matrix. So, I have to make this small trick.

Is there anything wrong in this step? It is just a way of using mtx instead of peaks?

I was thinking that the problem may be because the imputed data is not sparse as scATAC-seq. Rather, it is quite dense. Can this be the problem?

Thanks again

timoast commented 4 years ago

You should be able to use the mtx matrix directly. If not, can you give some more information about the matrix such as the output of the following:

class(mtx)
head(rownames(mtx))
head(colnames(mtx))
mtx[1:10,1:10]

However, I really don't recommend performing imputation on scATAC-seq data.

ahmedabbas81 commented 4 years ago

Thanks Tim for the answer, First, after this command

> mtx <- Matrix::readMM(mtx_path) %>%
+   magrittr::set_rownames(features$feature) %>%
+   magrittr::set_colnames(barcodes$barcode)

It gives this warning

Warning message:
In scan(file, nmax = nz, quiet = TRUE, what = list(i = integer(),  :
  number of items read is not a multiple of the number of columns
> class(mtx)
[1] "dgTMatrix"
attr(,"package")
[1] "Matrix"
> head(rownames(mtx))
[1] "chr1_3670918_3671808" "chr1_4414277_4414559" "chr1_4416634_4417136"
[4] "chr1_4417796_4418364" "chr1_4468331_4468752" "chr1_4490421_4490948"
> head(colnames(mtx))
[1] "AAACGAAAGGCAAGGG-1" "AAACGAAAGTGATCTC-1" "AAACGAACACTGTCGG-1" "AAACGAACAGATAAGT-1"
[5] "AAACGAACAGCCGAAA-1" "AAACGAAGTAAACGTA-1"
> mtx[1:10,1:10]
10 x 10 sparse Matrix of class "dgTMatrix"
   [[ suppressing 10 column names ‘AAACGAAAGGCAAGGG-1’, ‘AAACGAAAGTGATCTC-1’, ‘AAACGAACACTGTCGG-1’ ... ]]

chr1_3670918_3671808 1 1 . . 1 . 1 1 . .
chr1_4414277_4414559 . . . . . . . . . .
chr1_4416634_4417136 1 1 1 . 1 . 1 1 1 1
chr1_4417796_4418364 1 1 . . 1 . 1 1 1 .
chr1_4468331_4468752 1 1 . . 1 . 1 1 1 1
chr1_4490421_4490948 1 1 . . 1 . 1 1 1 1
chr1_4491496_4493750 1 1 1 1 1 1 1 1 1 1
chr1_4494489_4494498 . . . . . . . . . .
chr1_4495969_4497668 1 1 1 1 1 1 1 1 1 1
chr1_4516544_4516968 . . 1 1 1 1 1 1 . 1

Thank you

timoast commented 4 years ago

You can try changing it to a dgCMatrix by running: mtx <- as(object = mtx, Class = 'dgCMatrix')

ahmedabbas81 commented 4 years ago

Thank you,

I tried as you said,

mex_dir_path <- "filtered_peak_bc_matrix_imputed"

mtx_path <- paste(mex_dir_path, "matrix.mtx", sep = '/')
feature_path <- paste(mex_dir_path, "peaks.bed", sep = '/')
barcode_path <- paste(mex_dir_path, "barcodes.tsv", sep = '/')

features <- readr::read_tsv(feature_path, col_names = F) %>% tidyr::unite(feature)
barcodes <- readr::read_tsv(barcode_path, col_names = F) %>% tidyr::unite(barcode)

mtx <- Matrix::readMM(mtx_path) %>%
  magrittr::set_rownames(features$feature) %>%
  magrittr::set_colnames(barcodes$barcode)

mtx <- as(object = mtx, Class = 'dgCMatrix')

# create a gene activity matrix from the peak matrix and GTF, using chromosomes 1:22, X, and Y.
# Peaks that fall within gene bodies, or 2kb upstream of a gene, are considered
activity.matrix <- CreateGeneActivityMatrix(peak.matrix = mtx, annotation.file = "Mus_musculus.GRCm38.98.gtf", 
                                            seq.levels = c(1:19, "X", "Y"), upstream = 2000, verbose = TRUE)

It gives the following error:

Error in names(x) <- value : 
  'names' attribute [3] must be the same length as the vector [1]

If I use the trick:

peaks[peaks>0] <- 0
peaks <- peaks + mtx

It works. But later it gives me:

Error in checkSlotAssignment(object, name, value): assignment of an object of class "dgeMatrix" is not valid for slot 'data' in an object of class "Assay"; is(value, "AnyMatrix") is not TRUE
Traceback:

1. RunTFIDF(young_male)
2. RunTFIDF.Seurat(young_male)
3. RunTFIDF(object = assay.data, assay = assay, method = method, 
 .     scale.factor = scale.factor, verbose = verbose, ...)
4. RunTFIDF.Assay(object = assay.data, assay = assay, method = method, 
 .     scale.factor = scale.factor, verbose = verbose, ...)
5. SetAssayData(object = object, slot = "data", new.data = new.data)
6. SetAssayData.Assay(object = object, slot = "data", new.data = new.data)
7. `slot<-`(object = `*tmp*`, name = slot, value = new.data)
8. checkSlotAssignment(object, name, value)
9. stop(gettextf("assignment of an object of class %s is not valid for slot %s in an object of class %s; is(value, \"%s\") is not TRUE", 
 .     dQuote(valueClass), sQuote(name), dQuote(cl), slotClass), 
 .     domain = NA)

Thank you

timoast commented 4 years ago

What line do you get the error? mtx <- as(object = mtx, Class = 'dgCMatrix') or CreateGeneActivityMatrix?

ahmedabbas81 commented 4 years ago

I get the error in CreateGeneActivityMatrix

timoast commented 4 years ago

CreateGeneActivityMatrix requires that the peak coordinates are separated by ":" and "-", whereas you have "_" and "_". You will need to fix the row names of the peak matrix before running the function.

I recommend using the FeatureMatrix function in Signac to create a gene activity matrix rather than the CreateGeneActivityMatrix function in Seurat.

timoast commented 4 years ago

Actually it just requires the peaks separated by "-", so you would need to replace the underscores with dash, for example:

rownames(mtx) <- gsub('_', '-', rownames(mtx))
ahmedabbas81 commented 4 years ago

Ok, thank you

Probably that is the reason it works when I use the lines:

peaks[peaks>0] <- 0
peaks <- peaks + mtx

because peaks have the required format.

However, why do I get errors in RunTFIDF. I believe that this does not depend on gene activity matrix. Is it possible that the data is overimputed? It is not sparse anymore like the original scATAC-seq data? In other words, is it possible that RunTFIDF is not working because the quality of data decreased a lot after imputation?

Also I believe that RunTFIDF is also not related to integrating the gene activity matrix with scRNA-seq data. In the step of integration it gives

"More than 10% of provided features filtered out. Please check that the given features are present in the scale.data slot for both the assays provided here and that they have non-zero variance."Warning message in RunCCA.Seurat(object1 = reference, object2 = query, features = features, :
"Fewer than 50 features used as input for CCA."Running CCA

And then the kernel dies

Is it possible that there are no variable features because the gene activity matrix became dense rather than sparse? And aren't the features used for integration are the most variable 2000 genes from scRNA-seq data? Or it also checks if these 2000 genes are variable enough for example in the gene activity matrix?

By the way, I am convinced that the imputation results are not good. I just need to see the performance of integration after imputation.

Thanks a lot

timoast commented 4 years ago

I think the TF-IDF error was due to the matrix being stored in the wrong format, changing to dgCMatrix should fix that. Are you still seeing the error after first running mtx <- as(object = mtx, Class = 'dgCMatrix')?

ahmedabbas81 commented 4 years ago

Yes, I still see it

timoast commented 4 years ago

Can you install the latest version from the develop branch and try again? Should be fixed in https://github.com/timoast/signac/commit/cd58ef09d2945735970c7051bcde1a7f9cf6a25c

devtools::install_github("timoast/signac", ref = "develop")
ahmedabbas81 commented 4 years ago

It gave me this error

Error: Failed to install 'Signac' from GitHub:
  (converted from warning) installation of package 'digest' had non-zero exit status
Traceback:

1. devtools::install_github("timoast/signac", ref = "develop")
2. pkgbuild::with_build_tools({
 .     ellipsis::check_dots_used(action = getOption("devtools.ellipsis_action", 
 .         rlang::warn))
 .     {
 .         remotes <- lapply(repo, github_remote, ref = ref, subdir = subdir, 
 .             auth_token = auth_token, host = host)
 .         install_remotes(remotes, auth_token = auth_token, host = host, 
 .             dependencies = dependencies, upgrade = upgrade, force = force, 
 .             quiet = quiet, build = build, build_opts = build_opts, 
 .             build_manual = build_manual, build_vignettes = build_vignettes, 
 .             repos = repos, type = type, ...)
 .     }
 . }, required = FALSE)
3. install_remotes(remotes, auth_token = auth_token, host = host, 
 .     dependencies = dependencies, upgrade = upgrade, force = force, 
 .     quiet = quiet, build = build, build_opts = build_opts, build_manual = build_manual, 
 .     build_vignettes = build_vignettes, repos = repos, type = type, 
 .     ...)
4. tryCatch(res[[i]] <- install_remote(remotes[[i]], ...), error = function(e) {
 .     stop(remote_install_error(remotes[[i]], e))
 . })
5. tryCatchList(expr, classes, parentenv, handlers)
6. tryCatchOne(expr, names, parentenv, handlers[[1L]])
7. value[[3L]](cond)
timoast commented 4 years ago

It looks like you're getting an error installing dependencies, can you run install.packages("digest")?

ahmedabbas81 commented 4 years ago

When I run install.packages("digest"), I get the following warning:

Warning message in install.packages("digest"):
"installation of package 'digest' had non-zero exit status"Updating HTML index of packages in '.Library'
Making 'packages.html' ... done

And after that, when I run devtools::install_github("timoast/signac", ref = "develop"), I get the error:

Error: Failed to install 'Signac' from GitHub:
  (converted from warning) installation of package 'digest' had non-zero exit status
Traceback:

1. devtools::install_github("timoast/signac", ref = "develop")
2. pkgbuild::with_build_tools({
 .     ellipsis::check_dots_used(action = getOption("devtools.ellipsis_action", 
 .         rlang::warn))
 .     {
 .         remotes <- lapply(repo, github_remote, ref = ref, subdir = subdir, 
 .             auth_token = auth_token, host = host)
 .         install_remotes(remotes, auth_token = auth_token, host = host, 
 .             dependencies = dependencies, upgrade = upgrade, force = force, 
 .             quiet = quiet, build = build, build_opts = build_opts, 
 .             build_manual = build_manual, build_vignettes = build_vignettes, 
 .             repos = repos, type = type, ...)
 .     }
 . }, required = FALSE)
3. install_remotes(remotes, auth_token = auth_token, host = host, 
 .     dependencies = dependencies, upgrade = upgrade, force = force, 
 .     quiet = quiet, build = build, build_opts = build_opts, build_manual = build_manual, 
 .     build_vignettes = build_vignettes, repos = repos, type = type, 
 .     ...)
4. tryCatch(res[[i]] <- install_remote(remotes[[i]], ...), error = function(e) {
 .     stop(remote_install_error(remotes[[i]], e))
 . })
5. tryCatchList(expr, classes, parentenv, handlers)
6. tryCatchOne(expr, names, parentenv, handlers[[1L]])
7. value[[3L]](cond)

Thank you and sorry for disturbance

timoast commented 4 years ago

Unfortunately I won't be able to help much with problems with other packages, but you could try again after restarting R, or try installing the package from github: devtools::install_github("eddelbuettel/digest")

If you still have issues you should try contacting the authors of the digest package: https://github.com/eddelbuettel/digest

ahmedabbas81 commented 4 years ago

Thank you for your effort. You did your very best to help me.

ahmedabbas81 commented 4 years ago

Hello,

It finally passed the RunTFIDF function.

However in the command

young_male <- RunSVD(
  object = young_male,
  assay = 'ATAC',
  reduction.key = 'LSI_',
  reduction.name = 'lsi'
)

It gives the next error

Running SVD
Error in irlba(A = t(object), nv = n, work = irlba.work) : 
  BLAS/LAPACK routine 'DLASCL' gave error code -4

Thank you

timoast commented 4 years ago

I haven't seen that error before, can you provide the output from the following:

obj <- GetAssayData(young_male, assay = 'ATAC', slot = 'data')
obj[1:10,1:10]
class(obj)
sum(is.na(obj))
ahmedabbas81 commented 4 years ago
> obj <- GetAssayData(young_male, assay = 'ATAC', slot = 'data')
> obj[1:10,1:10]
10 x 10 sparse Matrix of class "dgCMatrix"
   [[ suppressing 10 column names 'AAACGAAAGGCAAGGG-1', 'AAACGAAAGTGATCTC-1', 'AAACGAACACTGTCGG-1' ... ]]

chr1:3670918-3671808 1 1 . . 1 . 1 1 . .
chr1:4416634-4417136 1 1 1 . 1 . 1 1 1 1
chr1:4417796-4418364 1 1 . . 1 . 1 1 1 .
chr1:4468331-4468752 1 1 . . 1 . 1 1 1 1
chr1:4490421-4490948 1 1 . . 1 . 1 1 1 1
chr1:4491496-4493750 1 1 1 1 1 1 1 1 1 1
chr1:4495969-4497668 1 1 1 1 1 1 1 1 1 1
chr1:4516544-4516968 . . 1 1 1 1 1 1 . 1
chr1:4571249-4572327 1 1 1 1 1 1 1 1 1 1
chr1:4602305-4602744 . . . 1 . . . 1 1 .
> class(obj)
[1] "dgCMatrix"
attr(,"package")
[1] "Matrix"
> sum(is.na(obj))
[1] 0
ahmedabbas81 commented 4 years ago

I am sorry. In the previous comment, I got the outputs of your commands before RunTFIDF.

After running RunTFIDF,

obj <- GetAssayData(young_male, assay = 'ATAC', slot = 'data')
> obj[1:10,1:10]
10 x 10 sparse Matrix of class "dgCMatrix"
   [[ suppressing 10 column names 'AAACGAAAGGCAAGGG-1', 'AAACGAAAGTGATCTC-1', 'AAACGAACACTGTCGG-1' ... ]]

chr1:3670918-3671808 0.4013911 0.4298537
chr1:4416634-4417136 0.2548904 0.2743484
chr1:4417796-4418364 0.3279812 0.3521067
chr1:4468331-4468752 0.3265712 0.3506100
chr1:4490421-4490948 0.3154784 0.3388310
chr1:4491496-4493750 0.2294472 0.2471934
chr1:4495969-4497668 0.2296984 0.2474616
chr1:4516544-4516968 .         .        
chr1:4571249-4572327 0.2297612 0.2475288
chr1:4602305-4602744 .         .        

chr1:3670918-3671808 .         .        
chr1:4416634-4417136 0.2800781 .        
chr1:4417796-4418364 .         .        
chr1:4468331-4468752 .         .        
chr1:4490421-4490948 .         .        
chr1:4491496-4493750 0.2524247 0.2533660
chr1:4495969-4497668 0.2526980 0.2536401
chr1:4516544-4516968 0.3808583 0.3821944
chr1:4571249-4572327 0.2527664 0.2537088
chr1:4602305-4602744 .         0.5551656

chr1:3670918-3671808 0.3992007 .        
chr1:4416634-4417136 0.2534002 .        
chr1:4417796-4418364 0.3261289 .        
chr1:4468331-4468752 0.3247257 .        
chr1:4490421-4490948 0.3136862 .        
chr1:4491496-4493750 0.2280894 0.2830694
chr1:4495969-4497668 0.2283393 0.2833714
chr1:4516544-4516968 0.3461292 0.4240915
chr1:4571249-4572327 0.2284018 0.2834470
chr1:4602305-4602744 .         .        

chr1:3670918-3671808 0.4062913 0.3877534
chr1:4416634-4417136 0.2582278 0.2456294
chr1:4417796-4418364 0.3321272 0.3164589
chr1:4468331-4468752 0.3307021 0.3150910
chr1:4490421-4490948 0.3194903 0.3043316
chr1:4491496-4493750 0.2324889 0.2210118
chr1:4495969-4497668 0.2327430 0.2212547
chr1:4516544-4516968 0.3524345 0.3359601
chr1:4571249-4572327 0.2328066 0.2213155
chr1:4602305-4602744 .         0.4926528

chr1:3670918-3671808 .         .        
chr1:4416634-4417136 0.2869445 0.2743936
chr1:4417796-4418364 0.3676649 .        
chr1:4468331-4468752 0.3661134 0.3506657
chr1:4490421-4490948 0.3539002 0.3388851
chr1:4491496-4493750 0.2586974 0.2472346
chr1:4495969-4497668 0.2589766 0.2475029
chr1:4516544-4516968 .         0.3734817
chr1:4571249-4572327 0.2590465 0.2475701
chr1:4602305-4602744 0.5652893 .        
> class(obj)
[1] "dgCMatrix"
attr(,"package")
[1] "Matrix"
> sum(is.na(obj))
[1] 438446030
timoast commented 4 years ago

Those NA values must be the problem. I think they can occur in TF-IDF if you have any rows or columns that are all 0.

Can you check the output of:

min(Matrix::rowSums(mtx))
min(Matrix::colSums(mtx))

If there are columns or rows with many zeros, you can remove them from the matrix using the SubsetMatrix function. For example:

mtx.filtered <- SubsetMatrix(mtx, min.rows = 5, min.cols = 5)
ahmedabbas81 commented 4 years ago

Thank you so much. You helped me in a great way. It is fine now.

strawberry789 commented 3 years ago

I am having a similar issue with integration of scATAC-seq data with scRNA-seq data. I imported my atac object and rna object. Then perform FindVariableFeatures, and proceed with FindTransferAnchors, but I get the following message:

Warning message in RunCCA.Seurat(object1 = reference, object2 = query, features = features, : "Running CCA on different assays" Warning message in RunCCA.Seurat(object1 = reference, object2 = query, features = features, : "More than 10% of provided features filtered out. Please check that the given features are present in the scale.data slot for both the assays provided here and that they have non-zero variance." Warning message in RunCCA.Seurat(object1 = reference, object2 = query, features = features, : "Fewer than 50 features used as input for CCA." Running CCA

And then the kernel died.

What is the issue here?

Thank you.

I am trying to integrate the gene activity matrix of scATAC-seq with scRNA-seq. The gene activity matrix is not obtained from original scATAC-seq data. Rather, it is obtained from imputed scATAC-seq data by a recent method. In integration it tells me:

"More than 10% of provided features filtered out. Please check that the given features are present in the scale.data slot for both the assays provided here and that they have non-zero variance."Warning message in RunCCA.Seurat(object1 = reference, object2 = query, features = features, : "Fewer than 50 features used as input for CCA."Running CCA

And then the kernel died

I remember in a question before you told me that the features used for integration are the most variable 2000 features from scRNA-seq. So, why do the above problems happen?

Is the scATAC-seq data over imputed and so it was not possible to find the same variable features in the activity matrix?

In addition, when running the TFIDF normalization, it gave me the following messages:

Error in checkSlotAssignment(object, name, value): assignment of an object of class "dgeMatrix" is not valid for slot 'data' in an object of class "Assay"; is(value, "AnyMatrix") is not TRUE Traceback:

  1. RunTFIDF(young_male)
  2. RunTFIDF.Seurat(young_male)
  3. RunTFIDF(object = assay.data, assay = assay, method = method, . scale.factor = scale.factor, verbose = verbose, ...)
  4. RunTFIDF.Assay(object = assay.data, assay = assay, method = method, . scale.factor = scale.factor, verbose = verbose, ...)
  5. SetAssayData(object = object, slot = "data", new.data = new.data)
  6. SetAssayData.Assay(object = object, slot = "data", new.data = new.data)
  7. slot<-(object = *tmp*, name = slot, value = new.data)
  8. checkSlotAssignment(object, name, value)
  9. stop(gettextf("assignment of an object of class %s is not valid for slot %s in an object of class %s; is(value, "%s") is not TRUE", . dQuote(valueClass), sQuote(name), dQuote(cl), slotClass), . domain = NA)

However, I skipped this part and proceeded to integration of the two datasets.

Sorry for the very long question and I hope I wrote it clearly.

Thank you

ahmedabbas81 commented 3 years ago

@strawberry789 In my case it was a problem I did in reading the data. It was imputed data not original data and I had a problem in reading the data itself as I remember. But in general, the functions should work fine with original data. Thank you

strawberry789 commented 3 years ago

I restarted the kernel and the same issue persists.