llecompte / SVJedi

SV genotyping with long reads
GNU Affero General Public License v3.0
40 stars 4 forks source link

Reference genome name parsing issue #1

Closed hd00ljy closed 4 years ago

hd00ljy commented 4 years ago

I've got an error message that the chr1 is not in the reference genome dictionary

I think the 63rd line of generateRef.py is causing this issue.


Reference genome FASTA ID lines often come with additional information just as follows

>chr1  AC:CM000663.2  gi:568336023  LN:248956422  rl:Chromosome  M5:6aef897c3d6ff0c78aff06ac189178dd  AS:GRCh38
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN```

So I suggest the change of the 63rd line of generateRef.py from

header = line.rstrip("\n")[1:]

to

line.rstrip("\n")[1:].split()[0]

or something similar

llecompte commented 4 years ago

That’s right indeed, thank you @hd00ljy for pointing that out. I will make the necessary changes quickly to handle this condition automatically.