GenBank file parsing is a major bottleneck for domain_search.py on large databases. The current GenBank parser is a fork of the BioPython GenBank parser, which is pure python, uses some regexes, and is slow. It would be great to integrate something like the rust parser: https://github.com/althonos/gb-io.py
A complication is that Domainator internals are quite reliant on BioPython SeqRecord objects, which might be hard to interface with or replicate with a faster genbank parser.
GenBank file parsing is a major bottleneck for
domain_search.py
on large databases. The current GenBank parser is a fork of the BioPython GenBank parser, which is pure python, uses some regexes, and is slow. It would be great to integrate something like the rust parser: https://github.com/althonos/gb-io.pyA complication is that Domainator internals are quite reliant on BioPython SeqRecord objects, which might be hard to interface with or replicate with a faster genbank parser.