single-cell-genetics / cellsnp-lite

Efficient genotyping bi-allelic SNPs on single cells
https://cellsnp-lite.readthedocs.io
Apache License 2.0
124 stars 11 forks source link

Multi-threading not working for mtRNA genotying? #40

Closed teng-gao closed 2 years ago

teng-gao commented 2 years ago

I'm trying to genotype mtRNA variants for MQuad input. It seems that the -p flag does not work and only a single thread is run:

cellsnp-lite \
    -s $home_dir/external/MDA/BAM_$sample.bam \
    -b $home_dir/external/MDA/$sample\_barcodes.tsv \
    -O $home_dir/external/MDA/mtRNA/pileup/$sample \
    -p 10 \
    --chrom=MT \
    --UMItag Auto \
    --minMAF 0 \
    --minCOUNT 0 \
    --genotype \
    --gzip
[I::main] start time: 2022-03-08 18:23:45
[I::main] mode2: pileup 1 whole chromosomes in 1097 single cells.
[W::hts_idx_load3] The index file is older than the data file: /d0-bayes/home/tenggao/external/MDA/BAM_TNBC1.bam.bai
[W::csp_pileup_core] Max depth set to maximum value (2147483647)
[I::csp_pileup_core][Thread-0] processing chrom MT ...
hxj5 commented 2 years ago

Hi, it is normal. For mode 2 (whole chromosome pileup), single chromosome can only use one thread. Only multiple input chromosomes can utilize multi-processing that each chromosome would be assigned to one certain thread.

Xianjie

teng-gao commented 2 years ago

I see. The pileup also ran into an error. The message was something like:

 ... running mode 2 failed.
 ... failed to merge mtx AD
 ... --genotype not found
hxj5 commented 2 years ago

Hi, could you provide more log information, especially the error message? Thank you!

teng-gao commented 2 years ago

Here's the full log:

[I::main] start time: 2022-03-08 19:39:10
[I::main] mode2: pileup 1 whole chromosomes in 1097 single cells.
[W::hts_idx_load3] The index file is older than the data file: /home/tenggao/external/MDA/BAM_TNBC1.bam.bai
[W::csp_pileup_core] Max depth set to maximum value (2147483647)
[I::csp_pileup_core][Thread-0] processing chrom MT ...
[I::csp_pileup_core][Thread-0] has pileup-ed in total 16569 SNPs for chrom MT
[E::csp_pileup] failed to merge mtx AD.
[E::main] running mode 2 failed.
[E::main] Quiting...
[I::main] end time: 2022-03-08 20:35:04
[I::main] time spent: 3354 seconds.

Output folder:

total 61M
drwxrwxr-x 3 tenggao tenggao   27 Mar  8 18:19 ../
-rw-rw-r-- 1 tenggao tenggao   51 Mar  8 19:39 cellSNP.tag.OTH.mtx
-rw-rw-r-- 1 tenggao tenggao   51 Mar  8 19:39 cellSNP.tag.DP.mtx
-rw-rw-r-- 1 tenggao tenggao  21K Mar  8 19:39 cellSNP.samples.tsv
drwxrwxr-x 2 tenggao tenggao  195 Mar  8 19:43 ./
-rw-rw-r-- 1 tenggao tenggao 275K Mar  8 20:35 cellSNP.base.vcf.gz
-rw-rw-r-- 1 tenggao tenggao  61M Mar  8 20:35 cellSNP.cells.vcf.gz
-rw-rw-r-- 1 tenggao tenggao 288K Mar  8 20:35 cellSNP.tag.AD.mtx
hxj5 commented 2 years ago

Hi, seems something went wrong when merging the "cellSNP.tag.AD.mtx". Not sure what the exact reason is. You may re-run the data for the moment, the running time (about 1h) seems acceptable for another run.

teng-gao commented 2 years ago

Hi Xianjie,

This is actually the second run, and I got the same error message. If I send you the output files, would you be able to look into this? It would be very helpful since I'm trying to use this package (MQuad) that depends on cellsnp-lite. Some files are too big to upload here, but I'm attaching what I can.

I'm following their tutorial here: https://github.com/single-cell-genetics/MQuad/blob/main/example/preprocessing_cmd.sh

Thanks, Teng cellSNP.base.vcf.gz cellSNP.samples.tsv.gz cellSNP.tag.AD.mtx.gz cellSNP.tag.DP.mtx.gz cellSNP.tag.OTH.mtx.gz

hxj5 commented 2 years ago

Hi Teng,

In cellSNP.base.vcf.gz, each SNP was outputted twice (33140 lines in total for 16569 SNPs). Not sure why. Could you try re-running the specific sample in a new dir (i.e., change the output dir for that sample)?

Best, Xianjie

teng-gao commented 2 years ago

Indeed, that solves the issue! Closing now.