stuart-lab / signac

R toolkit for the analysis of single-cell chromatin data
https://stuartlab.org/signac/
Other
308 stars 84 forks source link

Find issue with TSSEnrichment #408

Closed joeytai2010 closed 3 years ago

joeytai2010 commented 3 years ago

I found issue when I used TSSEnrichment, error shown below. This is rat genome. I don't know if the specie makes it error. Any suggestion or comment? Thanks.

scATAC_CTL <- TSSEnrichment(object = scATAC_CTL, fast = FALSE) Extracting TSS positions Finding + strand cut sites Finding - strand cut sites Error in colnames<-(*tmp*, value = seq_len(length.out = region.width) - : attempt to set 'colnames' on an object with less than two dimensions

My codes:

library(Signac)
library(Seurat)
library(GenomeInfoDb)
library(BSgenome.Rnorvegicus.UCSC.rn6)
library(EnsDb.Rnorvegicus.v79)
library(ggplot2)
library(patchwork)
library(hdf5r)
set.seed(1234)

counts_CTL <- Read10X_h5(filename = "~/R/Rat/ATAC/CTL_ATAC_F/filtered_peak_bc_matrix.h5")

metadata_CTL <- read.csv(
  file = "~/R/Rat/ATAC/CTL_ATAC_F/singlecell.csv",
  header = TRUE,
  row.names = 1)

chrom_assay_CTL <- CreateChromatinAssay(
  counts = counts_CTL,
  sep = c(":", "-"),
  genome = 'rn6',
  fragments = '~/R/Rat/ATAC/CTL_ATAC_F/fragments.tsv.gz',
  min.cells = 10,
  min.features = 200)

scATAC_CTL <- CreateSeuratObject(
  counts = chrom_assay_CTL,
  assay = "peaks",
  meta.data = metadata_CTL
)

scATAC_CTL[['peaks']]
granges(scATAC_CTL)
annotations <- GetGRangesFromEnsDb(ensdb = EnsDb.Rnorvegicus.v79)

seqlevelsStyle(annotations) <- 'UCSC'
genome(annotations) <- "rn6"

Annotation(scATAC_CTL) <- annotations

scATAC_CTL <- NucleosomeSignal(object = scATAC_CTL)

scATAC_CTL <- TSSEnrichment(object = scATAC_CTL, fast = FALSE)
timoast commented 3 years ago

Can you confirm that the annotations you're using match the genome build that the data were mapped to, and that the chromosome naming style is the same for both ("chr1" vs "1")?

timoast commented 3 years ago

Closing this as I haven't heard back

Jaureguy760 commented 3 years ago

Hi Tim

I am actually running a similar experiment with the rn6 Rattus norvegicus reference genome.

I went through the brain vignette on Signac and processed the data analysis properly through the clustering component. 1) I made a ref genome with cell ranger and utilize annotationHub:

EnsDb.Rnorvegicus.v98 = query(AnnotationHub(), pattern = c("Rattus Norvegicus", "EnsDb", 98))[[1]]

gene.ranges <- genes(EnsDb.Rnorvegicus.v98) gene.ranges <- gene.ranges[gene.ranges$gene_biotype == 'protein_coding',] gene.ranges <- keepStandardChromosomes(gene.ranges, pruning.mode = 'coarse', species='Rattus_norvegicus')

below is the config file parameters:

{ GENOME_FASTA_INPUT: "ftp://ftp.ensembl.org/pub/release-98/fasta/rattus_norvegicus/dna/Rattus_norvegicus.Rnor_6.0.dna.toplevel.fa.gz", GENE_ANNOTATION_INPUT: "ftp://ftp.ensembl.org/pub/release-98/gtf/rattus_norvegicus/Rattus_norvegicus.Rnor_6.0.98.gtf.gz", MOTIF_INPUT: "http://jaspar.genereg.net/download/CORE/JASPAR2020_CORE_vertebrates_non-redundant_pfms_jaspar.txt", ORGANISM: "Rattus norvegicus", PRIMARY_CONTIGS: ["1", "2", "3", "4", "5", "6", "7", "8", "9", "10", "11", "12", "13", "14", "15", "16", "17", "18", "19", "20", "X", "Y"], NON_NUCLEAR_CONTIGS: ["MT"] }

But I wanted to run DNA motifs and annotate the cell clusters with this vignette as well as other downstream vignettes that Signac currently offers since it is super convenient and looks fantastic! :)

error: Error in getOneSeqFromBSgenomeMultipleSequences During singeR tutorial ... getOneSeqFromBSgenomeMultipleSequences(x, name, start, NA, width, : sequence 1 not found

This error made me think I am having annotation issues with the build I custom made in cell ranger and its compatibilities with the library(BSgenome.Rnorvegicus.UCSC.rn6) and ensemble annotations????......

Currently, I am rebuilding the custom genome-based for cell ranger off the library(BSgenome.Rnorvegicus.UCSC.rn6) at this

website: https://hgdownload.soe.ucsc.edu/goldenPath/rn6/bigZips/

{ GENOME_FASTA_INPUT: "/gpfs/data01/bennerlab/home/jjauregu/cellranger-atac-1.2.0/rn6.fa.gz", GENE_ANNOTATION_INPUT: "/gpfs/data01/bennerlab/home/jjauregu/cellranger-atac-1.2.0/rn6.ncbiRefSeq.gtf.gz", MOTIF_INPUT: "http://jaspar.genereg.net/download/CORE/JASPAR2020_CORE_vertebrates_non-redundant_pfms_jaspar.txt", ORGANISM: "Rattus norvegicus", PRIMARY_CONTIGS: ["chr1", "chr2", "chr3", "chr4", "chr5", "chr6", "chr7", "chr8", "chr9", "chr10", "chr11", "chr12", "chr13", "chr14", "chr15", "chr16", "chr17", "chr18", "chr19", "chr20", "chrX", "chrY"], NON_NUCLEAR_CONTIGS: ["chrM"] }

What are your suggestions? Does this new ref genome I am making that matches the library(BSgenome.Rnorvegicus.UCSC.rn6) seem like it will work?

Also, what is your suggestion for the ensemble annotations? Which version should I utilize to properly run the DNA motif vignette?

I can provide code later if necessary.