single-cell-genetics / cellsnp-lite

Efficient genotyping bi-allelic SNPs on single cells
https://cellsnp-lite.readthedocs.io
Apache License 2.0
124 stars 11 forks source link

runtime and expected output #78

Open ahoffrichter opened 1 year ago

ahoffrichter commented 1 year ago

Hi, I ran cellsnp-lite with one bam file. It is now already running since 3 weeks. I was wondering if this is to be expected. Is there somewhere any information on the expected output files? There are already several files in my output folder and I am not sure if the program is actually already finished and it just appears as still running. Best, Anne

hxj5 commented 1 year ago

Hi, the files in the output folder should probably be the temporary files (with suffix such as .0, .1, ... etc). When the program finishes, the output folder should look like this example.

How many cells does the bam file contain? Sometimes it could take a long time for cellsnp-lite to genotype a big dataset, especially in mode 2a (i.e., to pileup whole chromosomes for 10x data). Could you also share your command line and the version of cellsnp-lite?

Best, Xianjie

ahoffrichter commented 1 year ago

Hi, the bam file should contain about 15,600 cells. I used mode 1a with the region vcf from here. I'm running this on a cluster with cellsnp-lite version 1.2.2. I'm not exactly sure what you mean with "share your command line".

Best, Anne

hxj5 commented 1 year ago

The command line contains all the parameters you used to run cellsnp-lite, e.g.,

cellsnp-lite -s $BAM -b $BARCODE -O $OUT_DIR -R $REGION_VCF -p 20 --minMAF 0.1 --minCOUNT 20 --gzip.

The 15,600 cells indicate the bam is probably a big 10x dataset. To speedup, you may

ahoffrichter commented 1 year ago

Ah ok, the command I used looks like this:

cellsnp-lite -s path/to/possorted_genome_bam.bam -b /path/to/raw_feature_bc_matrix/barcodes.tsv.gz -O /vireo/test -R /vireo/genome1K.phase3.SNP_AF5e2.chr1toX.hg38.vcf.gz -p 20 --minMAF 0.1 --minCOUNT 20 --gzip

Yes indeed, it is a 10x dataset and I used the raw barcodes. I will try out your suggestions, thank you very much for your help!

roshni-b commented 10 months ago

For 15k cells, roughly how long did this take to run?

nansne commented 3 months ago

hello, i ran cellsnp-lite, and it failed with"[E::idx_find_and_load] Could not retrieve index file for '/h/sunnan/VKHDATA/vkha/gex_possorted_bam.bam'", what should i do next? thank you very much!