iqbal-lab-org / gramtools

Genome inference from a population reference graph
MIT License
92 stars 15 forks source link

'N' in reference genome fasta not supported #156

Open bricoletc opened 3 years ago

bricoletc commented 3 years ago

When running gramtools build with --ref <ref.fa> and <ref.fa> contains 'N' bases (or indeed anything else than A/C/G/T) we get:

2021-03-01 17:27:35,872 gramtools    ERROR    Did not receive a nucleotide: N not in {A,C,G,T}
2021-03-01 17:27:35,872 gramtools    ERROR    Unsuccessful vcf_to_PRG_string_conversion.

The current recommended workaround is to replace these with a given base, eg replace all 'N's with 'C's.

A possible fix is to use a dedicated integer for non-A/C/G/T bases in the binary prg string, which should trigger read mapping to fail when reached.