Closed archieandrews10 closed 10 months ago
This doesn't look like a valid fragment record:
chrX\t138306934\t138306985\tchrX\t138306903\t138306954\tMM_424.AAAGAGCGGCTGCTGAGTTCCT\t60\t-\t+
The fragment file should contain 5 fields (or 6 fields for single-end reads). See here for more details: https://support.10xgenomics.com/single-cell-multiome-atac-gex/software/pipelines/latest/output/fragments?src=social&lss=youtube&cnm=&cid=NULL
It's strange because the fragment files are from the BICCN project. I obtained the fragment files from this location http://catlas.org/catlas_downloads/humanbrain/bedpe/ As per the article https://www.science.org/doi/10.1126/science.adf7044#sec-10, they did not use 10x Genomics platform. They used custom libraries and used SnapATAC package to process the data.
Hi Zhang:
I tried the other way round. I ran make_fragment_file using the BAM files from the same project.
snapatac2.pp.make_fragment_file(
'/LL_173.bam',
'/LL_173_fragment.bed',
is_paired=True,
barcode_tag=None,
barcode_regex="^[^\:]+",
umi_tag=None,
umi_regex=None,
shift_left=4,
shift_right=-5,
min_mapq=30,
compression=None,
tempdir='/bams/'
)
I receive the below print, but the output file has no contents.
_FlagStat { read: 24123085, primary: 24122902, secondary: 0, supplementary: 183, duplicate: 0, primary_duplicate: 0, mapped: 24123085, primary_mapped: 24122902, paired: 24122902, read_1: 12061451, read_2: 12061451, proper_pair: 24122902, mate_mapped: 24122902, singleton: 0, mate_reference_sequence_id_mismatch: 0, mate_reference_sequence_id_mismatchhq: 0 }
Can you tell me how I can overcome this issue?
thanks Arch
This should work:
snapatac2.pp.make_fragment_file(
'/LL_173.bam',
'/LL_173_fragment.bed',
is_paired=True,
barcode_tag=None,
barcode_regex="(^\w+):",
umi_tag=None,
umi_regex=None,
shift_left=4,
shift_right=-5,
min_mapq=30,
compression=None,
tempdir='/bams/'
)
Thanks. That worked! I have performed all the below mentioned steps on the sc-ATACSeq data.
Preprocessing Doublet removal Dimenstion reduction Clustering analysis Cell cluster annotation Create the cell by gene activity matrix Imputation
But I want to perform cell type annotation on the matrix. I could not find for the scATACseq. Could you point me to the right page to perform cell type annotation on the scATACseq matrix?
There many ways to perform cell type annotation. Since you have got the cell by gene matrix, the simplest way you can do is to use SCANPY or other tools to perform marker gene analysis and annotate cell types, OR use other reference scRNA-seq dataset to perform label transfer: https://kzhang.org/SnapATAC2/tutorials/annotation.html
I'm closing the issue since the original problem has been solved.
I am running the below code;
And I am receiving the below following error. Any idea how to resolve this?
PanicException Traceback (most recent call last) File:1, in
File ~/.local/lib/python3.9/site-packages/snapatac2/preprocessing/_basic.py:277, in import_data(fragment_file, chrom_sizes, file, min_num_fragments, sorted_by_barcode, whitelist, chrM, shift_left, shift_right, chunk_size, tempdir, backend, n_jobs) 275 else: 276 adata = AnnData() if file is None else internal.AnnData(filename=file, backend=backend) --> 277 internal.import_fragments( 278 adata, fragment_file, chrom_sizes, chrM, min_num_fragments, 279 sorted_by_barcode, shift_left, shift_right, chunk_size, whitelist, tempdir, 280 ) 281 return adata
PanicException: called' panicked at /rustc/82e1608dfa6e0b5569232559e3d385fea5a93112/library/core/src/ops/function.rs:166:5:
called
Result::unwrap()
on anErr
value: Custom { kind: Other, error: "InvalidStrand(Invalid): chrX\t138306934\t138306985\tchrX\t138306903\t138306954\tMM_424.AAAGAGCGGCTGCTGAGTTCCT\t60\t-\t+" } thread 'Result::unwrap()
on anErr
value: Custom { kind: Other, error: "InvalidStrand(Invalid): chrX\t138306934\t138306985\tchrX\t138306903\t138306954\tMM_424.AAAGAGCGGCTGCTGAGTTCCT\t60\t-\t+" }