With the Aho-Corasick Automaton from the pyahocorasick Python library, it's possible to subtype reads or contigs much more quickly than with BLAST or Jellyfish.
In this PR, I've also added support for gzipped FASTA or FASTQ files.
I've added tests for the new AC run mode (also cleaned up the tests).
By default, AC will be used unless the --slow commandline argument is provided by the user.
The only downside to the AC method is that since you're not computing all kmer counts, you cannot calculate the min kmer coverage threshold automatically using the method currently in bio_hansel.
With the Aho-Corasick Automaton from the
pyahocorasick
Python library, it's possible to subtype reads or contigs much more quickly than with BLAST or Jellyfish.In this PR, I've also added support for gzipped FASTA or FASTQ files.
I've added tests for the new AC run mode (also cleaned up the tests).
By default, AC will be used unless the
--slow
commandline argument is provided by the user.The only downside to the AC method is that since you're not computing all kmer counts, you cannot calculate the min kmer coverage threshold automatically using the method currently in
bio_hansel
.