sridnona / cb_sniffer

mutation(barcode) caller for 10x single cell data
GNU General Public License v3.0
43 stars 16 forks source link

failing on line 101: if indel > 0: #5

Open lhendrikse opened 4 years ago

lhendrikse commented 4 years ago

I am attempting to run cb_sniffer on a variant file that contains only SNPs (i.e. the value for indel argument should only ever be 0). When I run cb_sniffer it fails on line 101: if indel > 0.

I had a quick read through the code and it looks like the indel variable is never assigned? Any ideas?

LH

sridnona commented 4 years ago

Hi @lhendrikse Apologies for delayed response, my first guess would be did your variant file have any headers? if not then adding a header should resolve this issue.

Thanks Sid

omansn commented 4 years ago

Hi @lhendrikse ,

I ran into the same error and I believe I found the source. It looks like the indel variable is assigned in the count_barcodes() definition. When count_barcodes() is executed on line 347, indel is assigned as the output of x.classify()

barcode_counts = x.count_barcodes(bam_file, bars, x.classify(), mq, bq)

where x.classify() is assigned to indel.

It seems like classify() assigns variants as snp or variant based on whether the lengths of the ref and alt entries are equal.

def classify(self):
        """
        for each variant classifies if its a snp or variant
        :return:
        0 for snp
        int if its insertion or deletion
        """
        if len(self.ref) == len(self.alt) == 1 and (self.ref != '-' or self.alt != '-'):
            return 0
        else:
            if self.alt == '-':  # deletion
               return len(self.ref)
            elif self.ref == '-':  # insertion
               return len(self.alt)

This becomes problematic if you have multi-allelic alt calls ( for example REF=A. ALT=C,G because len(A)!=LEN(C,G) ). I see that @sridnona is working on making cb_sniffer compatible with VCF files. Hopefully this solves the issue. Otherwise, I was able to get around the issue by splitting multi-allelic sites into separate entries.

sridnona commented 4 years ago

@omansn Good call, we never came across multi-allelic calls in our use case. And yes this will be factored in with the vcf patch. Only if i can get some downtime.

Thanks Sid

omansn commented 4 years ago

@sridnona .. Let me know if I can contribute my also-minimal downtime and help with anything. A tool like this would be pretty helpful for a few of my projects.