single-cell-genetics / cellsnp-lite

Efficient genotyping bi-allelic SNPs on single cells
https://cellsnp-lite.readthedocs.io
Apache License 2.0
124 stars 11 forks source link

Error when using more than 2 bam files with mode 1B #94

Open cameronyoungpark opened 1 year ago

cameronyoungpark commented 1 year ago

Hello, I am trying to use cellsnp-lite to format genotype files for demultiplexing with vireo. There are 8 genotypes in my multiplexed sample and I am trying to use 3-4 known genotyped samples to aid in the demultiplexing. The method works great with only 2 but increasing to 3 I get lots of errors. step 1: cellsnp-lite --genotype -R ref_file -s multiplexed_singlecell_file -b barcodes.tsv -O output_folder -p 22 --minMAF 0.1 --minCOUNT 100 --gzip

step 2: This works great for 2 files, but then doesnt seem to work with 3- I get 20 extra files for each of the expected 6 output files.

cellsnp-lite -s BAM1, BAM2, BAM3 -I donor1, donor2, donor3 -O germline_folder -R ref_file -p 20 --cellTAG None --UMItag None --gzip --genotype step 3- When I run step 2 with 3 bam files, I get an Index out of range error that I do not get when only using 2 bam files in step 2

vireo -c output_folder/cellSNP.cells.vcf.gz -d germline_folder/cellSNP.cells.vcf.gz -o vireo_output -p 12 -N 8 Is it possible to run step 2 with multiple bam files? If so, would love some input on what I am doing incorrectly! Thanks!

hxj5 commented 1 year ago

Hi, could you share the log file, especially the "index out of range error" part you mentioned? That should be helpful.

cameronyoungpark commented 1 year ago

Hello, I don't have the log output file because it did not successfully run vireo (not sure if you mean a different log file) but here is example of the terminal output with the error: (base) cyp2111_columbia_edu@vireo:~$ vireo -c /home/cyp2111_columbia_edu/KMA3_1.cellsnp/cellSNP.cells.vcf.gz -d /home/cyp2111_columbia_edu/germlineKMA3/cellSNP.cells.vcf.gz -o /home/cyp2111_columbia_edu/KMA3_1.vireo/ -p 12 -N 8 [vireo] Loading cell VCF file ... [vireo] Loading donor VCF file ... [vireo] 5500 out 6893 variants matched to donor VCF [vireo] Demultiplex 27532 cells to 8 donors with 5500 variants. [vireo] lower bound ranges [-63696.5, -61707.1, -58860.4] [vireo] allelic rate mean and concentrations: [[0.013 0.448 0.952]] [[86129.4 74625.5 37268.1]] [vireo] donor size before removing doublets: donor0 donor1 donor2 donor3 donor4 donor5 donor6 donor7 3403 3369 3456 3466 3444 3461 3437 3494 Traceback (most recent call last): File "/home/cyp2111_columbia_edu/anaconda3/bin/vireo", line 8, in sys.exit(main()) File "/home/cyp2111_columbia_edu/anaconda3/lib/python3.8/site-packages/vireoSNP/vireo.py", line 217, in main write_donor_id(out_dir, donor_names, cell_dat['samples'], n_vars, res_vireo) File "/home/cyp2111_columbia_edu/anaconda3/lib/python3.8/site-packages/vireoSNP/utils/io_utils.py", line 99, in write_donor_id donor_singlet = np.array(donor_names, "U100")[np.argmax(ID_prob, axis=1)] IndexError: index 7 is out of bounds for axis 0 with size 7

hxj5 commented 1 year ago

Hi, you mentioned there were 20 extra files for each of the expected 6 output files in step 2, which indicates either cellsnp had not finished yet, or some errors occured. It should help a lot if you could run step 2 again and then share the log file of step 2.

cameronyoungpark commented 1 year ago

Hi, how do I get the log file for step 2?

On Wed, May 31, 2023 at 8:50 PM Xianjie Huang @.***> wrote:

Hi, you mentioned there were 20 extra files for each of the expected 6 output files in step 2, which indicates either cellsnp had not finished yet, or some errors occured. It should help a lot if you could run step 2 again and then share the log file of step 2.

— Reply to this email directly, view it on GitHub https://github.com/single-cell-genetics/cellsnp-lite/issues/94#issuecomment-1571151594, or unsubscribe https://github.com/notifications/unsubscribe-auth/AQ3RUAMFSU453HLMENCC5RDXI7RM3ANCNFSM6AAAAAAYUMI6ZY . You are receiving this because you authored the thread.Message ID: @.***>

hxj5 commented 1 year ago

Hi, you can use Linux I/O redirection to get the log file (i.e., run cellsnp-lite [options...] &>log_file), or you can directly share the terminal output of step 2.