liulab-dfci / MAESTRO

Single-cell Transcriptome and Regulome Analysis Pipeline
GNU General Public License v3.0
277 stars 77 forks source link

Chromap, Less than 5% barcodes can be found or corrected based on the barcode whitelist #152

Open genecell opened 2 years ago

genecell commented 2 years ago

Hi,

I am using the MAESTRO to analyze a scATAC dataset downloaded form SRA database (accession number: SRR10399252), but I met this error:

Output file: Result/Mapping/SRR10399252_epilepsy/fragments_pre_corrected_dedup_count.tsv
Loaded all sequences successfully in 12.35s, number of sequences: 195, number of bases: 3099922541.
Kmer size: 17, window size: 7.
Lookup table size: 393150044, occurrence table size: 444597151.
Loaded index successfully in 30.40s.
Loaded 737280 barcodes in 1.45s.
Loaded sequence batch successfully in 0.82s, number of sequences: 500000, number of bases: 8000000.
Less than 5% barcodes can be found or corrected based on the barcode whitelist.
Please check whether the barcode whitelist matches the data, e.g. length, reverse-complement. If this is a false positive warning, please run Chromap with the option --skip-barcode-check.

I have also tried the minimap2 for mapping, but also got error:

[Sat Jan  8 17:29:03 2022]
rule scatac_mergepeak:
    input: Result/Analysis/SRR12130207_Lega_42/SRR12130207_Lega_42_all_peaks.narrowPeak
    output: Result/Analysis/SRR12130207_Lega_42/SRR12130207_Lega_42_final_peaks.bed
    jobid: 7
    benchmark: Result/Benchmark/SRR12130207_Lega_42_PeakMerge.benchmark
    wildcards: sample=SRR12130207_Lega_42

[Sat Jan  8 17:29:04 2022]
Error in rule scatac_mergepeak:
    jobid: 7
    output: Result/Analysis/SRR12130207_Lega_42/SRR12130207_Lega_42_final_peaks.bed
    shell:

            cat Result/Analysis/SRR12130207_Lega_42/SRR12130207_Lega_42_all_peaks.narrowPeak             | sort -k1,1 -k2,2n | cut -f 1-4 > Result/Analysis/SRR12130207_Lega_42/SRR12130207_Lega_42_cat_peaks.bed

            mergeBed -i Result/Analysis/SRR12130207_Lega_42/SRR12130207_Lega_42_cat_peaks.bed | grep -v '_' | grep -v 'chrEBV' > Result/Analysis/SRR12130207_Lega_42/SRR12130207_Lega_42_final_peaks.bed

            rm Result/Analysis/SRR12130207_Lega_42/SRR12130207_Lega_42_cat_peaks.bed

        (one of the commands exited with non-zero exit code; note that snakemake uses bash strict mode!)

I can successfully run the test data provided by MAESTRO, so I do not know whether it is due to the scATAC-seq data itself. Thanks in advance!

Best regards, Min

haowenz commented 2 years ago

Did you used Chromap custom format for barcode? If yes, this just got fixed here and would work in Chromap next release.