pontussk / PMDtools

Compute postmortem damage patterns and decontaminate ancient genomes
GNU General Public License v3.0
15 stars 7 forks source link

Faulty `DS:Z:` matching #9

Open etkayapar opened 3 years ago

etkayapar commented 3 years ago

Currently the program searches for the exact string DS:Z: in the whole line from stdin to restore the information using added optional fields from previous runs of the program. This creates problems when the pattern is present in fields other than optional fields. We encountered this when trying to process a bam file that had one single such problematic read in the whole ~60GB bam. To reproduce the issue, the read is pasted below:

ERR566093.277697703 0   2   158103001   37  55M *   0   0   AGCACTCATGTTGTTCTGCGTGACAACTTTGCTTAGGCCGAGCCCTACAGAAACT ]R]TUNR3Z]I]]S];S]6R[AHP??UWTS]Z]]DS:Z:Q5YKUPT[]]]LOP[M X0:i:1  X1:i:0  MD:Z:55 RG:Z:fooXG:i:0  NM:i:0  XM:i:0  XO:i:0  XP:i:1  XT:A:U

outputs:

Traceback (most recent call last):
  File "pmdtools.0.60.py", line 349, in <module>
    PMDS= float(line.split('DS:Z:')[1].rstrip('\n').split()[0])
ValueError: could not convert string to float: Q5YKUPT[]]]LOP[M

when the program is run only with the --deamination option.