mehrdadbakhtiari / adVNTR

A tool for genotyping Variable Number Tandem Repeats (VNTR) from sequence data
http://advntr.readthedocs.io/
BSD 3-Clause "New" or "Revised" License
41 stars 15 forks source link

Add accuracy filter for hifi reads #69

Closed sara-javadzadeh closed 10 months ago

sara-javadzadeh commented 10 months ago

Accuracy filter is applied for HiFi reads and disabled for Illumina short reads for now. The filter applies more strict criteria on recruiting reads for genotyping based on 1) increased minimum flanking read length 2) lower mismatch in flanking region 3) increased number of supporting reads for each repeat count to be considered before computing the genotype

This commit also fixs minor bugs in 1) vntr_finder.py (line 410) for an error for genotyping VNTRs causing a fatal error when sequence of a read is None. 2) pairwise_aln_generator.py (line 351) for an error when writing pairwise_aln files resulting in empty files. The script reads log file and assumes that after finding all the relevant lines, the remaining lines are only read sequences. When trying to parse the read sequence for an unrelated (and not-parsed) line, it fails. I added a condition to make sure the line in question is in fact a read sequence and not a line that is not processed.