single-cell-genetics / cellsnp-lite

Efficient genotyping bi-allelic SNPs on single cells
https://cellsnp-lite.readthedocs.io
Apache License 2.0
124 stars 11 forks source link

Duplicate barcodes error? #80

Closed terencewtli closed 1 year ago

terencewtli commented 1 year ago

Hi,

Thanks for providing this tool - I've had success with it for 10X bam files. However, for another dataset, I'm encountering a duplicate barcodes issue:

[W::csp_mplp_prepare] duplicate barcodes or sample IDs.
[E::csp_mplp_prepare] failed to set sample names.

I'm using data from SHARE-seq, which looks like:

A00438:1038:HHGGKDRX2:2:2172:32045:23703_TTCCTGCT,CCTATTGA,TGACCACT,SS-PKR-126_CCCTAACCCT       272     chr1    10108   0       20S30M  *       0       0       CCGGAGATGTGTATAAGAGACAGCCCTAACCCTAACCCTAACCCTAACCC   FFFFFF,FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF      NH:i:6  HI:i:3  AS:i:27 nM:i:1  MD:Z:2A27       NM:i:1  CB:Z:TTCCTGCTCCTATTGATGACCACT

The command I'm using is:

 time cellsnp-lite -s $BAM -b $BARCODES \
  -O $OUT -R ${VCF} --minMAF 0.1 --minCOUNT 20 --gzip

I've verified that the barcodes supplied to cellsnp-lite are unique, so I am a bit lost on why this error is occurring. Does it have something to do with the barcodes existing in the read names?

hxj5 commented 1 year ago

Hi, thanks for the feedback. Seems the program failed when checking barcodes. Although you may have verified that the barcodes are unique, could you share the barcode file here for double check?

terencewtli commented 1 year ago

Hi,

Sorry for the late response! This is embarrassing, but it turns out that there were duplicate barcodes that I didn't remove properly. Once I fixed that, it's working now. Thank you!