error: "InvalidStrand(Invalid)

archieandrews10 commented 10 months ago

I am running the below code;

%%time
data = snap.pp.import_data(
    fragment_file,
    chrom_sizes=snap.genome.hg38,
    sorted_by_barcode=False,
)
data

And I am receiving the below following error. Any idea how to resolve this?

PanicException Traceback (most recent call last) File :1, in

File ~/.local/lib/python3.9/site-packages/snapatac2/preprocessing/_basic.py:277, in import_data(fragment_file, chrom_sizes, file, min_num_fragments, sorted_by_barcode, whitelist, chrM, shift_left, shift_right, chunk_size, tempdir, backend, n_jobs) 275 else: 276 adata = AnnData() if file is None else internal.AnnData(filename=file, backend=backend) --> 277 internal.import_fragments( 278 adata, fragment_file, chrom_sizes, chrM, min_num_fragments, 279 sorted_by_barcode, shift_left, shift_right, chunk_size, whitelist, tempdir, 280 ) 281 return adata

PanicException: called Result::unwrap() on an Err value: Custom { kind: Other, error: "InvalidStrand(Invalid): chrX\t138306934\t138306985\tchrX\t138306903\t138306954\tMM_424.AAAGAGCGGCTGCTGAGTTCCT\t60\t-\t+" } thread '' panicked at /rustc/82e1608dfa6e0b5569232559e3d385fea5a93112/library/core/src/ops/function.rs:166:5: called Result::unwrap() on an Err value: Custom { kind: Other, error: "InvalidStrand(Invalid): chrX\t138306934\t138306985\tchrX\t138306903\t138306954\tMM_424.AAAGAGCGGCTGCTGAGTTCCT\t60\t-\t+" }

kaizhang commented 10 months ago

This doesn't look like a valid fragment record:

chrX\t138306934\t138306985\tchrX\t138306903\t138306954\tMM_424.AAAGAGCGGCTGCTGAGTTCCT\t60\t-\t+

The fragment file should contain 5 fields (or 6 fields for single-end reads). See here for more details: https://support.10xgenomics.com/single-cell-multiome-atac-gex/software/pipelines/latest/output/fragments?src=social&lss=youtube&cnm=&cid=NULL

archieandrews10 commented 10 months ago

It's strange because the fragment files are from the BICCN project. I obtained the fragment files from this location http://catlas.org/catlas_downloads/humanbrain/bedpe/ As per the article https://www.science.org/doi/10.1126/science.adf7044#sec-10, they did not use 10x Genomics platform. They used custom libraries and used SnapATAC package to process the data.

archieandrews10 commented 10 months ago

Hi Zhang:

I tried the other way round. I ran make_fragment_file using the BAM files from the same project.

snapatac2.pp.make_fragment_file(
    '/LL_173.bam', 
    '/LL_173_fragment.bed',
    is_paired=True,
    barcode_tag=None,
    barcode_regex="^[^\:]+", 
    umi_tag=None,
    umi_regex=None,
    shift_left=4,
    shift_right=-5, 
    min_mapq=30,
    compression=None,
    tempdir='/bams/'
)

I receive the below print, but the output file has no contents.

_FlagStat { read: 24123085, primary: 24122902, secondary: 0, supplementary: 183, duplicate: 0, primary_duplicate: 0, mapped: 24123085, primary_mapped: 24122902, paired: 24122902, read_1: 12061451, read_2: 12061451, proper_pair: 24122902, mate_mapped: 24122902, singleton: 0, mate_reference_sequence_id_mismatch: 0, mate_reference_sequence_id_mismatchhq: 0 }

Can you tell me how I can overcome this issue?

thanks Arch

kaizhang commented 10 months ago

This should work:

snapatac2.pp.make_fragment_file(
    '/LL_173.bam', 
    '/LL_173_fragment.bed',
    is_paired=True,
    barcode_tag=None,
    barcode_regex="(^\w+):", 
    umi_tag=None,
    umi_regex=None,
    shift_left=4,
    shift_right=-5, 
    min_mapq=30,
    compression=None,
    tempdir='/bams/'
)

archieandrews10 commented 10 months ago

Thanks. That worked! I have performed all the below mentioned steps on the sc-ATACSeq data.

Preprocessing Doublet removal Dimenstion reduction Clustering analysis Cell cluster annotation Create the cell by gene activity matrix Imputation

But I want to perform cell type annotation on the matrix. I could not find for the scATACseq. Could you point me to the right page to perform cell type annotation on the scATACseq matrix?

kaizhang commented 10 months ago

There many ways to perform cell type annotation. Since you have got the cell by gene matrix, the simplest way you can do is to use SCANPY or other tools to perform marker gene analysis and annotate cell types, OR use other reference scRNA-seq dataset to perform label transfer: https://kzhang.org/SnapATAC2/tutorials/annotation.html

kaizhang commented 10 months ago

I'm closing the issue since the original problem has been solved.

kaizhang / SnapATAC2

error: "InvalidStrand(Invalid) #215