single-cell-genetics / cellsnp-lite

Efficient genotyping bi-allelic SNPs on single cells
https://cellsnp-lite.readthedocs.io
Apache License 2.0
124 stars 11 forks source link

the resulting file has only the title and no content #102

Open suhuanhou opened 10 months ago

suhuanhou commented 10 months ago

seq_name="lcc54" conda activate vireo cd /share/home/shh/wes_out/vireo_out bam_path=/share/home/shh/data/${seq_name}_cellranger5/outs/possorted_genome_bam.bam barcode_path=/share/home/shh/data/${seq_name}_cellranger5/outs/filtered_feature_bc_matrix/barcodes.tsv OUT_DIR=/share/home/shh/wes_out/cellsnp-lite REGION_VCF=/share/home/shh/download/cellSNP/genome1K.phase3.SNP_AF5e4.chr1toX.hg38.vcf

(vireo) [shh@node4 vireo_out]$ cellsnp-lite -s ${bam_path} -b ${barcode_path} -O ${OUT_DIR} -R ${REGION_VCF} -p 20 --minMAF 0.1 --minCOUNT 20 --gzip [I::main] start time: 2023-08-30 18:54:10 [W::check_args] Max depth set to maximum value (2147483647) [W::check_args] Max pileup set to maximum value (2147483647) [I::main] global settings after checking: num of input files = 1 out_dir = /share/home/shh/wes_out/cellsnp-lite is_out_zip = 1, is_genotype = 0 is_target = 0, num_of_pos = 0 num_of_barcodes = 25859, num_of_samples = 0 refseq file = (null) 0 chroms: cell-tag = CB, umi-tag = UB nthreads = 20, tp_max_open = 1024 mthreads = 20, tp_errno = 0, tp_ntry = 0 min_count = 20, min_maf = 0.10, double_gl = 0 min_len = 30, min_mapq = 20 rflag_filter = 772, rflag_require = 0 max_depth = 2147483647, max_pileup = 2147483647, no_orphan = 1 [I::main] loading the VCF file for given SNPs ... [I::main] fetching 36305578 candidate variants ... [I::main] mode 1a: fetch given SNPs in 25859 single cells. [I::csp_fetch_core][Thread-0] 2.00% SNPs processed. . . . [I::csp_fetch_core][Thread-11] 98.00% SNPs processed. [I::csp_fetch_core][Thread-8] 98.00% SNPs processed. [I::main] All Done! [I::main] Version: 1.2.3 (htslib 1.17) [I::main] CMD: cellsnp-lite -s /share/home/shh/data/lcc54_cellranger5/outs/possorted_genome_bam.bam -b /share/home/shh/data/lcc54_cellranger5/outs/filtered_feature_bc_matrix/barcodes.tsv -O /share/home/shh/wes_out/cellsnp-lite -R /share/home/shh/download/cellSNP/genome1K.phase3.SNP_AF5e4.chr1toX.hg38.vcf -p 20 --minMAF 0.1 --minCOUNT 20 --gzip [I::main] end time: 2023-08-30 20:37:12 [I::main] time spent: 6182 seconds.

cellsnp-lite works, but the resulting file has only the title and no content

hxj5 commented 10 months ago

Hi, is your data 10x scDNA-seq or scATAC-seq? If so, you may add --UMItag None. Or you may also check whether the cell barcodes you specified in the cmdline are compatible with the CB tag in the BAM? (See issue #44 for details)

suhuanhou commented 10 months ago

Hi, is your data 10x scDNA-seq or scATAC-seq? If so, you may add --UMItag None. Or you may also check whether the cell barcodes you specified in the cmdline are compatible with the CB tag in the BAM? (See issue #44 for details)

cellSNP.base.vcf

fileformat=VCFv4.2

CHROM POS ID REF ALT QUAL FILTER INFO

cellSNP.tag.AD.mtx %%MatrixMarket matrix coordinate integer general % 0 25859 0

My data is 10X single-cell RNA-seq data. I tried to add the argument "--chrom GRCh38_chr1", but the file is still blank.

suhuanhou commented 10 months ago

cellSNP.samples.tsv AAACCCAAGCGTTAGG-1 AAACCCAAGCTTACGT-1 AAACCCAAGTCACTAC-1 AAACCCACACAAATGA-1 AAACCCACACCCTAGG-1 AAACCCACAGACCAGA-1 AAACCCACAGACCTAT-1 AAACCCACAGGCAATG-1 AAACCCACAGTATTCG-1

suhuanhou commented 10 months ago

[shh@node4 ~]$ samtools view /share/home/shh/data/${seq_name}_cellranger5/outs/possorted_genome_bam.bam

A00212:80:HWYWHDSX2:1:2215:27959:32252 16 GRCh38_chr1 10004 0 91M 0 0 CCCTAACCCTAAACATAACCATAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAAC ,F,FF:FF,FFF,F,FF:FF,FF:F:FFFF,F,FF,:F,FFFF,FFFF:FFFFFFFFFFFFFF:FF,FFFFFFFFFFFFFFFFFF:,FFFF NH:i:9 HI:i:1 AS:i:83 nM:i:RG:Z:lcc54_cellranger5:0:1:HWYWHDSX2:1 RE:A:I xf:i:0 CR:Z:TACATTCAGATTGATG CY:Z:FFFFFFF:FFFFFFFF CB:Z:TACATTCAGATTGATG-1 UR:Z:TAGGAAGTGT UY:Z:FFFFFFFFFF UB:Z:TAGGAAGTGT A00212:80:HWYWHDSX2:1:2362:28908:1016 16 GRCh38_chr1 10004 0 91M 0 0 CCCTATCCCTAACGCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTATCCCTTACCCTAACCCTAACCCTAACCCTAACCCTAAC F,FFFF:FFFFFF,F,FF:FFF,:FFFF:FFFFFFFF:F,FFFFF:FFFFFFF,F:FF,:FFFFF,FFFFFFFFF:FFFF,FFFFFFF:F: NH:i:6 HI:i:1 AS:i:81 nM:i:RG:Z:lcc54_cellranger5:0:1:HWYWHDSX2:1 RE:A:I xf:i:0 CR:Z:GATTTCTTCCCATAGA CY:Z:FFF:FFFFFFFFFFF: CB:Z:GATTTCTTCCCATAGA-1 UR:Z:CTCCCCACGT UY:Z:FFFFFFFFFF UB:Z:CTCCCCACGT A00212:80:HWYWHDSX2:1:1477:26078:14841 16 GRCh38_chr1 10005 1 91M * 0 0 CCTCACCATAACCCTTAACCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACC F,F,:F:,:,FF,,F,,,F:F:::FFFFFFF,FFFFFFFFFFFFFFFFFFFFFFFFFFF:FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF NH:i:3 HI:i:1 AS:i:81 nM:i:RG:Z:lcc54_cellranger5:0:1:HWYWHDSX2:1 RE:A:I xf:i:0 CR:Z:TCATTGTTCTTCTTCC CY:Z:FFFFFFFFFFFFFFFF CB:Z:TCATTGTTCTTCTTCC-1 UR:Z:TAACCCTAAC UY:Z:FFFFFFFFFF UB:Z:TAACCCTAAC

hxj5 commented 10 months ago

Hi, thanks for the detailed information. The cell and umi tags seem all right. The issue is probably caused by the difference of chromosome names in the input VCF and BAM files, noted the chrom names in your BAM have prefix "GRCh38_" that does not exist in VCF. You may change the chromosome names in either file to make them compatible with each other.

hxj5 commented 10 months ago

BTW, it seems you were using the VCF file "genome1K.phase3.SNP_AF5e4.chr1toX.hg38.vcf". You may try the VCF of AF5e2 version, which contains much less SNPs (hence much faster) and is sufficient for most downstream analysis.