Open suhuanhou opened 10 months ago
Hi, is your data 10x scDNA-seq or scATAC-seq? If so, you may add --UMItag None
. Or you may also check whether the cell barcodes you specified in the cmdline are compatible with the CB
tag in the BAM? (See issue #44 for details)
Hi, is your data 10x scDNA-seq or scATAC-seq? If so, you may add
--UMItag None
. Or you may also check whether the cell barcodes you specified in the cmdline are compatible with theCB
tag in the BAM? (See issue #44 for details)
cellSNP.base.vcf
cellSNP.tag.AD.mtx %%MatrixMarket matrix coordinate integer general % 0 25859 0
My data is 10X single-cell RNA-seq data. I tried to add the argument "--chrom GRCh38_chr1", but the file is still blank.
cellSNP.samples.tsv AAACCCAAGCGTTAGG-1 AAACCCAAGCTTACGT-1 AAACCCAAGTCACTAC-1 AAACCCACACAAATGA-1 AAACCCACACCCTAGG-1 AAACCCACAGACCAGA-1 AAACCCACAGACCTAT-1 AAACCCACAGGCAATG-1 AAACCCACAGTATTCG-1
[shh@node4 ~]$ samtools view /share/home/shh/data/${seq_name}_cellranger5/outs/possorted_genome_bam.bam
A00212:80:HWYWHDSX2:1:2215:27959:32252 16 GRCh38_chr1 10004 0 91M 0 0 CCCTAACCCTAAACATAACCATAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAAC ,F,FF:FF,FFF,F,FF:FF,FF:F:FFFF,F,FF,:F,FFFF,FFFF:FFFFFFFFFFFFFF:FF,FFFFFFFFFFFFFFFFFF:,FFFF NH:i:9 HI:i:1 AS:i:83 nM:i:RG:Z:lcc54_cellranger5:0:1:HWYWHDSX2:1 RE:A:I xf:i:0 CR:Z:TACATTCAGATTGATG CY:Z:FFFFFFF:FFFFFFFF CB:Z:TACATTCAGATTGATG-1 UR:Z:TAGGAAGTGT UY:Z:FFFFFFFFFF UB:Z:TAGGAAGTGT A00212:80:HWYWHDSX2:1:2362:28908:1016 16 GRCh38_chr1 10004 0 91M 0 0 CCCTATCCCTAACGCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTATCCCTTACCCTAACCCTAACCCTAACCCTAACCCTAAC F,FFFF:FFFFFF,F,FF:FFF,:FFFF:FFFFFFFF:F,FFFFF:FFFFFFF,F:FF,:FFFFF,FFFFFFFFF:FFFF,FFFFFFF:F: NH:i:6 HI:i:1 AS:i:81 nM:i:RG:Z:lcc54_cellranger5:0:1:HWYWHDSX2:1 RE:A:I xf:i:0 CR:Z:GATTTCTTCCCATAGA CY:Z:FFF:FFFFFFFFFFF: CB:Z:GATTTCTTCCCATAGA-1 UR:Z:CTCCCCACGT UY:Z:FFFFFFFFFF UB:Z:CTCCCCACGT A00212:80:HWYWHDSX2:1:1477:26078:14841 16 GRCh38_chr1 10005 1 91M * 0 0 CCTCACCATAACCCTTAACCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACC F,F,:F:,:,FF,,F,,,F:F:::FFFFFFF,FFFFFFFFFFFFFFFFFFFFFFFFFFF:FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF NH:i:3 HI:i:1 AS:i:81 nM:i:RG:Z:lcc54_cellranger5:0:1:HWYWHDSX2:1 RE:A:I xf:i:0 CR:Z:TCATTGTTCTTCTTCC CY:Z:FFFFFFFFFFFFFFFF CB:Z:TCATTGTTCTTCTTCC-1 UR:Z:TAACCCTAAC UY:Z:FFFFFFFFFF UB:Z:TAACCCTAAC
Hi, thanks for the detailed information. The cell and umi tags seem all right. The issue is probably caused by the difference of chromosome names in the input VCF and BAM files, noted the chrom names in your BAM have prefix "GRCh38_" that does not exist in VCF. You may change the chromosome names in either file to make them compatible with each other.
BTW, it seems you were using the VCF file "genome1K.phase3.SNP_AF5e4.chr1toX.hg38.vcf". You may try the VCF of AF5e2 version, which contains much less SNPs (hence much faster) and is sufficient for most downstream analysis.
seq_name="lcc54" conda activate vireo cd /share/home/shh/wes_out/vireo_out bam_path=/share/home/shh/data/${seq_name}_cellranger5/outs/possorted_genome_bam.bam barcode_path=/share/home/shh/data/${seq_name}_cellranger5/outs/filtered_feature_bc_matrix/barcodes.tsv OUT_DIR=/share/home/shh/wes_out/cellsnp-lite REGION_VCF=/share/home/shh/download/cellSNP/genome1K.phase3.SNP_AF5e4.chr1toX.hg38.vcf
(vireo) [shh@node4 vireo_out]$ cellsnp-lite -s ${bam_path} -b ${barcode_path} -O ${OUT_DIR} -R ${REGION_VCF} -p 20 --minMAF 0.1 --minCOUNT 20 --gzip [I::main] start time: 2023-08-30 18:54:10 [W::check_args] Max depth set to maximum value (2147483647) [W::check_args] Max pileup set to maximum value (2147483647) [I::main] global settings after checking: num of input files = 1 out_dir = /share/home/shh/wes_out/cellsnp-lite is_out_zip = 1, is_genotype = 0 is_target = 0, num_of_pos = 0 num_of_barcodes = 25859, num_of_samples = 0 refseq file = (null) 0 chroms: cell-tag = CB, umi-tag = UB nthreads = 20, tp_max_open = 1024 mthreads = 20, tp_errno = 0, tp_ntry = 0 min_count = 20, min_maf = 0.10, double_gl = 0 min_len = 30, min_mapq = 20 rflag_filter = 772, rflag_require = 0 max_depth = 2147483647, max_pileup = 2147483647, no_orphan = 1 [I::main] loading the VCF file for given SNPs ... [I::main] fetching 36305578 candidate variants ... [I::main] mode 1a: fetch given SNPs in 25859 single cells. [I::csp_fetch_core][Thread-0] 2.00% SNPs processed. . . . [I::csp_fetch_core][Thread-11] 98.00% SNPs processed. [I::csp_fetch_core][Thread-8] 98.00% SNPs processed. [I::main] All Done! [I::main] Version: 1.2.3 (htslib 1.17) [I::main] CMD: cellsnp-lite -s /share/home/shh/data/lcc54_cellranger5/outs/possorted_genome_bam.bam -b /share/home/shh/data/lcc54_cellranger5/outs/filtered_feature_bc_matrix/barcodes.tsv -O /share/home/shh/wes_out/cellsnp-lite -R /share/home/shh/download/cellSNP/genome1K.phase3.SNP_AF5e4.chr1toX.hg38.vcf -p 20 --minMAF 0.1 --minCOUNT 20 --gzip [I::main] end time: 2023-08-30 20:37:12 [I::main] time spent: 6182 seconds.
cellsnp-lite works, but the resulting file has only the title and no content