Closed ZivTQ closed 10 months ago
Hi, it looks like your fragment file may be incorrectly formatted. Can you show the first few lines of your fragment file?
Hi, thanks for your reply! Here is the fragment file:
head(Fragments(sce_atac)[[5]]) chrom start end barcode readCount 1 1 10072 10326 TACTGAGGTCAAGTAT-1 1 2 1 10073 10215 GGTTATATCATTTGTC-1 1 3 1 10073 10321 TAGTCAATCATAGACC-1 1 4 1 10078 10309 AGCTAAACATTATGCG-1 2 5 1 10078 10309 GGATATTGTTGTTGTC-1 3 6 1 10078 10327 TTTAACGAGGCCGGAA-1 3
Annotation(A) <- annotations A<- NucleosomeSignal(object = A) A<- TSSEnrichment(object = A, fast = FALSE, verbose = T, assay = 'peaks')
Annotation(B) <- annotations B<- NucleosomeSignal(object = B) B<- TSSEnrichment(object = B, fast = FALSE, verbose = T, assay = 'peaks') ............
peaks <- CallPeaks(object = sce_atac, group.by = "AnnotatedcellType",outdir = "./")
Processing file /home/ATAC_data/A/atac_fragments.tsv.gz Keeping 10081 cell barcodes Splitting into 5 files Done Processing 153 million lines
Processing file /home/ATAC_data/B/atac_fragments.tsv.gz Keeping 0 cell barcodes Splitting into 5 files Done Processing 202 million lines
Processing file /home/ATAC_data/C/atac_fragments.tsv.gz Keeping 0 cell barcodes Splitting into 5 files Done Processing 145 million lines
Processing file /home/ATAC_data/D/atac_fragments.tsv.gz Keeping 0 cell barcodes Splitting into 5 files Done Processing 94 million lines
Processing file /home/ATAC_data/E/atac_fragments.tsv.gz Keeping 0 cell barcodes Splitting into 5 files Done Processing 140 million linesError: fragment file lines too long Error in SplitFragments(object = object, assay = assay, group.by = group.by, : Error: cannot open requested file
sessionInfo() R version 4.2.2 Patched (2022-11-10 r83330) Platform: x86_64-pc-linux-gnu (64-bit) Running under: Ubuntu 20.04.5 LTS
Matrix products: default BLAS: /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.9.0 LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.9.0
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
[3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=en_US.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] stats4 grid stats graphics grDevices utils datasets methods
[9] base
other attached packages:
[1] EnsDb.Hsapiens.v86_2.99.0 ensembldb_2.22.0 AnnotationFilter_1.22.0
[4] GenomicFeatures_1.50.4 AnnotationDbi_1.60.2 Biobase_2.58.0
[7] GenomicRanges_1.50.2 GenomeInfoDb_1.34.9 IRanges_2.32.0
[10] S4Vectors_0.36.2 BiocGenerics_0.44.0 dplyr_1.1.4
[13] ComplexHeatmap_2.14.0 ggpubr_0.6.0 qs_0.25.7
[16] clustree_0.5.1 ggraph_2.1.0 ggplot2_3.4.4
[19] Seurat_5.0.1 harmony_1.2.0 Rcpp_1.0.11
[22] Signac_1.12.9003 SeuratObject_5.0.1 sp_2.1-2
I'm not understanding...is the original issue with NucleosomeSignal now fixed? If there is a new error please include the full code you're running that results in an error.
peaks <- CallPeaks(object = sce_atac, group.by = "AnnotatedcellType",outdir = "./") ####Error Processing file /home/ATAC_data/A/atac_fragments.tsv.gz Keeping 10081 cell barcodes Splitting into 5 files Done Processing 153 million lines
Processing file /home/ATAC_data/B/atac_fragments.tsv.gz Keeping 0 cell barcodes Splitting into 5 files Done Processing 202 million lines
Processing file /home/ATAC_data/C/atac_fragments.tsv.gz Keeping 0 cell barcodes Splitting into 5 files Done Processing 145 million lines
Processing file /home/ATAC_data/D/atac_fragments.tsv.gz Keeping 0 cell barcodes Splitting into 5 files Done Processing 94 million lines
Processing file /home/ATAC_data/E/atac_fragments.tsv.gz Keeping 0 cell barcodes Splitting into 5 files Done Processing 140 million linesError: fragment file lines too long Error in SplitFragments(object = object, assay = assay, group.by = group.by, : Error: cannot open requested file
Hi, Thanks you. The error occured when running CallPeaks
peaks <- CallPeaks(object = sce_atac, group.by = "AnnotatedcellType",outdir = "./")
Processing file /home/ATAC_data/A/atac_fragments.tsv.gz Keeping 10081 cell barcodes Splitting into 5 files Done Processing 153 million lines
Processing file /home/ATAC_data/B/atac_fragments.tsv.gz Keeping 0 cell barcodes Splitting into 5 files Done Processing 202 million lines
Processing file /home/ATAC_data/C/atac_fragments.tsv.gz Keeping 0 cell barcodes Splitting into 5 files Done Processing 145 million lines
Processing file /home/ATAC_data/D/atac_fragments.tsv.gz Keeping 0 cell barcodes Splitting into 5 files Done Processing 94 million lines
Processing file /home/ATAC_data/E/atac_fragments.tsv.gz Keeping 0 cell barcodes Splitting into 5 files Done Processing 140 million linesError: fragment file lines too long Error in SplitFragments(object = object, assay = assay, group.by = group.by, : Error: cannot open requested file
please include the full code you're running that results in an error
please include the full code you're running that results in an error
Sorry, All of the codes are here:
remotes::install_github("stuart-lab/signac", ref = "develop")
library(Signac)
library(Seurat)
library(GenomeInfoDb)
library(EnsDb.Hsapiens.v86)
library(BSgenome.Hsapiens.UCSC.hg38)
library(harmony)
library(tidyverse)
library(ggplot2)
library(ggpubr)
library(ggalluvial)
#1、load data ====================================
# load data A1
A1_counts <- Read10X_h5(filename = "/home/ks_ts/scATAC/A1/filtered_feature_bc_matrix.h5")
A1_counts <- A1_counts[[2]]
A1_assay <- CreateChromatinAssay(counts = A1_counts,
sep = c(":", "-"),
genome = 'hg38',
fragments = '/home/ks_ts/ATAC/A1/atac_fragments.tsv.gz',
min.cells = 10,
min.features = 200)
A1 <- CreateSeuratObject(A1_assay, assay = 'peaks',project = 'A1')
rm(A1_assay, A1_counts)
# load data A2
A2_counts <- Read10X_h5(filename = "/home/ks_ts/scATAC/A2/filtered_feature_bc_matrix.h5")
A2_counts <- A2_counts[[2]]
A2_assay <- CreateChromatinAssay(counts = A2_counts,
sep = c(":", "-"),
genome = 'hg38',
fragments = '/home/ks_ts/ATAC/A2/atac_fragments.tsv.gz',
min.cells = 10,
min.features = 200)
A2 <- CreateSeuratObject(A2_assay, assay = 'peaks',project = 'A2')
rm(A2_assay, A2_counts)
# load data A3
A3_counts <- Read10X_h5(filename = "/home/ks_ts/scATAC/A3/filtered_feature_bc_matrix.h5")
A3_counts <- A3_counts[[2]]
A3_assay <- CreateChromatinAssay(counts = A3_counts,
sep = c(":", "-"),
genome = 'hg38',
fragments = '/home/ks_ts/ATAC/A3/atac_fragments.tsv.gz',
min.cells = 10,
min.features = 200)
A3 <- CreateSeuratObject(A3_assay, assay = 'peaks',project = 'A3')
rm(A3_assay, A3_counts)
# load data A4
A4_counts <- Read10X_h5(filename = "/home/ks_ts/scATAC/A4/filtered_feature_bc_matrix.h5")
A4_counts <- A4_counts[[2]]
A4_assay <- CreateChromatinAssay(counts = A4_counts,
sep = c(":", "-"),
genome = 'hg38',
fragments = '/home/ks_ts/ATAC/A4/atac_fragments.tsv.gz',
min.cells = 10,
min.features = 200)
A4 <- CreateSeuratObject(A4_assay, assay = 'peaks',project = 'A4')
rm(A4_assay, A4_counts)
# load data B1
B1_counts <- Read10X_h5(filename = "/home/ks_ts/scATAC/TA1/filtered_feature_bc_matrix.h5")
B1_counts <- B1_counts[[2]]
B1_assay <- CreateChromatinAssay(counts = B1_counts,
sep = c(":", "-"),
genome = 'hg38',
fragments = '/home/ks_ts/ATAC/B1/atac_fragments.tsv.gz',
min.cells = 10,
min.features = 200)
B1 <- CreateSeuratObject(B1_assay, assay = 'peaks',project = 'B1')
rm(B1_assay, B1_counts)
# load data B2
B2_counts <- Read10X_h5(filename = "/home/ks_ts/scATAC/TA2/filtered_feature_bc_matrix.h5")
B2_counts <- B2_counts[[2]]
B2_assay <- CreateChromatinAssay(counts = B2_counts,
sep = c(":", "-"),
genome = 'hg38',
fragments = '/home/ks_ts/ATAC/B2/atac_fragments.tsv.gz',
min.cells = 10,
min.features = 200)
B2 <- CreateSeuratObject(B2_assay, assay = 'peaks',project = 'B2')
rm(B2_assay, B2_counts)
# load data B3
B3_counts <- Read10X_h5(filename = "/home/ks_ts/scATAC/TA3/filtered_feature_bc_matrix.h5")
B3_counts <- B3_counts[[2]]
B3_assay <- CreateChromatinAssay(counts = B3_counts,
sep = c(":", "-"),
genome = 'hg38',
fragments = '/home/ks_ts/ATAC/B3/atac_fragments.tsv.gz',
min.cells = 10,
min.features = 200)
B3 <- CreateSeuratObject(B3_assay, assay = 'peaks',project = 'B3')
rm(B3_assay, B3_counts)
# load data B4
B4_counts <- Read10X_h5(filename = "/home/ks_ts/scATAC/TA4/filtered_feature_bc_matrix.h5")
B4_counts <- B4_counts[[2]]
B4_assay <- CreateChromatinAssay(counts = B4_counts,
sep = c(":", "-"),
genome = 'hg38',
fragments = '/home/ks_ts/ATAC/B4/atac_fragments.tsv.gz',
min.cells = 10,
min.features = 200)
B4 <- CreateSeuratObject(B4_assay, assay = 'peaks',project = 'B4')
rm(B4_assay, B4_counts)
#2、preprocess of data ====================================
annotations <- GetGRangesFromEnsDb(ensdb = EnsDb.Hsapiens.v86)
# seqlevelsStyle(annotations) <- 'UCSC'
genome(annotations) <- "hg38"
Annotation(A1) <- annotations
A1 <- NucleosomeSignal(object = A1)
A1 <- TSSEnrichment(object = A1, fast = FALSE, verbose = T, assay = 'peaks')
Annotation(A2) <- annotations
A2 <- NucleosomeSignal(object = A2)
A2 <- TSSEnrichment(object = A2, fast = FALSE, verbose = T, assay = 'peaks')
Annotation(A3) <- annotations
A3 <- NucleosomeSignal(object = A3)
A3 <- TSSEnrichment(object = A3, fast = FALSE, verbose = T, assay = 'peaks')
Annotation(A4) <- annotations
A4 <- NucleosomeSignal(object = A4)
A4 <- TSSEnrichment(object = A4, fast = FALSE, verbose = T, assay = 'peaks')
Annotation(B1) <- annotations
B1 <- NucleosomeSignal(object = B1)
B1 <- TSSEnrichment(object = B1, fast = FALSE, verbose = T, assay = 'peaks')
Annotation(B2) <- annotations
B2 <- NucleosomeSignal(object = B2)
B2 <- TSSEnrichment(object = B2, fast = FALSE, verbose = T, assay = 'peaks')
Annotation(B3) <- annotations
B3 <- NucleosomeSignal(object = B3)
B3 <- TSSEnrichment(object = B3, fast = FALSE, verbose = T, assay = 'peaks')
Annotation(B4) <- annotations
B4 <- NucleosomeSignal(object = B4)
B4 <- TSSEnrichment(object = B4, fast = FALSE, verbose = T, assay = 'peaks')
# merge all datasets
combined <- merge(x = A1,
y = list(A2, A3, A4, B1, B2, B3, B4),
add.cell.ids = c("A1", "A2", "A3", "A4", "B1", "B2", "B3", "B4"))
save(combined, file = "combined.RData")
rm(A1,A2, A3, A4, B1, B2, B3, B4)
combined$blacklist_fraction <- FractionCountsInRegion(object = combined,
assay = 'peaks',
regions = blacklist_hg38)
# combined$pct_reads_in_peaks <- combined$peak_region_fragments / combined$passed_filters * 100
# combined$blacklist_ratio <- combined$blacklist_region_fragments / combined$peak_region_fragments
combined$high.tss <- ifelse(combined$TSS.enrichment > 2, 'High', 'Low')
combined$nucleosome_group <- ifelse(combined$nucleosome_signal > 4, 'NS > 4', 'NS < 4')
TSSPlot(combined, group.by = 'high.tss', assay = 'peaks') + NoLegend()
FragmentHistogram(object = combined, group.by = 'nucleosome_group')
#3、cluster ===================================
combined <- RunTFIDF(combined)
combined <- FindTopFeatures(combined, min.cutoff = 'q0')
combined <- RunSVD(combined)
library(harmony)
combined <- RunHarmony(object = combined,
group.by.vars = 'orig.ident',
reduction.use = 'lsi',
assay.use = 'peaks',
project.dim = FALSE)
combined <- RunUMAP(combined, dims = 2:30, reduction = "harmony")
combined<- FindNeighbors(object = combined,reduction = "harmony",dims = 2:30)
combined<- FindClusters(object = combined, algorithm = 3,resolution = seq(0.2, 1.2, by = 0.1),verbose = FALSE)
library(clustree)
clustree(combined)
DimPlot(object = combined, reduction = 'umap',label = TRUE) + NoLegend()
DimPlot(object = combined, reduction = 'umap',label = TRUE, split.by = 'orig.ident', ncol = 4) + NoLegend()
# save(combined, file = "combined_cluster.RData")
#add gene activatied matrix
gene.activities <- GeneActivity(combined)
# add the gene activity matrix to the Seurat object as a new assay and normalize it
combined[['ACTIVITY']] <- CreateAssayObject(counts = gene.activities)
combined <- NormalizeData(
object = combined,
assay = 'ACTIVITY',
normalization.method = 'LogNormalize',
scale.factor = median(combined$nCount_ACTIVITY)
)
# save(combined, file = "combined_cluster.RData")
#4、integrate.scATAC.scRNA ====================================
#load scRNA data
library(qs)
sce_RNA <- qs::qread("./sce_anno.qs")
#ATAC data
combined$seurat_clusters <- combined$peaks_snn_res.1.2
Idents(combined) <- combined$seurat_clusters
DefaultAssay(combined) <- "ACTIVITY"
sce_RNA <- FindVariableFeatures(object = sce_RNA,nfeatures = 5000)
transfer.anchors <- FindTransferAnchors(reference = sce_RNA ,
query = combined,reduction = 'cca',dims = 1:30)
predicted.labels <- TransferData(anchorset = transfer.anchors,
refdata = sce_RNA$celltype,
weight.reduction = combined[['lsi']],
dims = 2:30)
combined <- AddMetaData(object = combined, metadata = predicted.labels)
# save(combined, file = "combined_cluster.RData")
colnames(combined@meta.data)[25] <- 'celltype'
#=5、recluster====================================
#### recluster
DefaultAssay(sce_ATAC) <- "peaks"
sce_ATAC <- RunTFIDF(sce_ATAC)
sce_ATAC <- FindTopFeatures(sce_ATAC)
sce_ATAC <- RunSVD(sce_ATAC)
library(harmony)
sce_ATAC <- RunHarmony(object = sce_ATAC,
group.by.vars = 'orig.ident',
reduction.use = 'lsi',
assay.use = 'peaks',
project.dim = FALSE)
sce_ATAC <- RunUMAP(sce_ATAC, dims = 2:15, reduction = "harmony")
sce_ATAC<- FindNeighbors(object = sce_ATAC,reduction = "harmony",dims = 2:15)
sce_ATAC<- FindClusters(object = sce_ATAC, algorithm = 3,resolution = seq(0.2, 1.2, by = 0.1),verbose = FALSE)
clustree(sce_ATAC)
sce_ATAC$seurat_clusters <- sce_ATAC$peaks_snn_res.0.3
DimPlot(sce_ATAC, label = T, repel = T, group.by = 'peaks_snn_res.0.4')
DimPlot(sce_ATAC, label = T, repel = T, group.by = 'celltype')
save(sce_ATAC, file = 'sce_ATAC.RData')
#6、peaks analysis====================================
DefaultAssay(sce_ATAC) <- "peaks"
Idents(sce_ATAC) <- 'celltype'
peaks <- CallPeaks(
object = sce_ATAC,
group.by = "celltype",
outdir = "./",
macs2.path = "/home/ks_ts/miniconda3/bin/macs2"
)`
In the last line, CallPeaks backed error:
`Processing file /home/ATAC_data/A1/atac_fragments.tsv.gz
Keeping 10081 cell barcodes
Splitting into 5 files
Done Processing 153 million lines
Processing file /home/ATAC_data/A2/atac_fragments.tsv.gz
Keeping 0 cell barcodes
Splitting into 5 files
Done Processing 202 million lines
Processing file /home/ATAC_data/A3/atac_fragments.tsv.gz
Keeping 0 cell barcodes
Splitting into 5 files
Done Processing 145 million lines
Processing file /home/ATAC_data/A4/atac_fragments.tsv.gz
Keeping 0 cell barcodes
Splitting into 5 files
Done Processing 94 million lines
Processing file /home/ATAC_data/A5/atac_fragments.tsv.gz
Keeping 0 cell barcodes
Splitting into 5 files
Done Processing 140 million linesError: fragment file lines too long
Error in SplitFragments(object = object, assay = assay, group.by = group.by, :
Error: cannot open requested file
`
My packages version:
`R version 4.2.2 Patched (2022-11-10 r83330)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 20.04.5 LTS
Matrix products: default
BLAS: /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.9.0
LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.9.0
Random number generation:
RNG: L'Ecuyer-CMRG
Normal: Inversion
Sample: Rejection
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
[3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=en_US.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] grid stats4 stats graphics grDevices utils datasets
[8] methods base
other attached packages:
[1] rhdf5_2.42.1 Matrix_1.6-4
[3] data.table_1.14.10 plyr_1.8.9
[5] magrittr_2.0.3 gtable_0.3.4
[7] gtools_3.9.5 gridExtra_2.3
[9] ArchR_1.0.2 ggalluvial_0.12.5
[11] ggpubr_0.6.0 lubridate_1.9.3
[13] forcats_1.0.0 stringr_1.5.1
[15] dplyr_1.1.4 purrr_1.0.2
[17] readr_2.1.4 tidyr_1.3.0
[19] tibble_3.2.1 tidyverse_2.0.0
[21] harmony_1.2.0 Rcpp_1.0.12
[23] BSgenome.Hsapiens.UCSC.hg38_1.4.5 BSgenome_1.66.3
[25] rtracklayer_1.58.0 Biostrings_2.66.0
[27] XVector_0.38.0 EnsDb.Hsapiens.v86_2.99.0
[29] ensembldb_2.22.0 AnnotationFilter_1.22.0
[31] GenomicFeatures_1.50.4 AnnotationDbi_1.60.2
[33] Signac_1.12.9003 SingleR_2.0.0
[35] SummarizedExperiment_1.28.0 Biobase_2.58.0
[37] GenomicRanges_1.50.2 GenomeInfoDb_1.34.9
[39] IRanges_2.32.0 S4Vectors_0.36.2
[41] BiocGenerics_0.44.0 MatrixGenerics_1.10.0
[43] ggplot2_3.4.4 Seurat_5.0.1
[45] matrixStats_1.1.0 SeuratObject_5.0.1
[47] sp_2.1-2
Thank you
Are you sure you are changing the correct column name when running colnames(combined@meta.data)[25] <- 'celltype'
?
A safer way to do this is combined$celltype <- combined$predicted.id
It's weird that only the first fragment file seems to contain any cells for the 5 cell types.
I think it's possible that ~140 million lines into the 5th fragment file there is an issue with one of the lines
Are you sure you are changing the correct column name when running
colnames(combined@meta.data)[25] <- 'celltype'
?A safer way to do this is
combined$celltype <- combined$predicted.id
It's weird that only the first fragment file seems to contain any cells for the 5 cell types.
I think it's possible that ~140 million lines into the 5th fragment file there is an issue with one of the lines
@timoast
Thank you very much! We re-ran cellranger atac, and obtain the corrected fargment.tsv.gz files. This problems resolved!
Best wishes!
Hi,I have some probloms in Signac when running:
NucleosomeSignal(object = obj)
I got Error:
Done Processing 145 million linesError: fragment file lines too long Error in
[.data.frame(counts, cells.keep > 0, c("mononucleosomal", "nucleosome_free")) : undefined columns selected
I try to change n in the function NucleosomeSignal, However, It is still not work. Does anyone know how to fix it? Thanks very much!