Open ritster opened 3 days ago
not sure if it helps, but oftentimes there is the idea of using 'indexed fasta' to chop out a genomic region of a larger fasta file https://pypi.org/project/pyfaidx/ ("faidx" comes from the name "samtools faidx" for fasta index). this could be an alternative to needing to accept multiple fasta files
Allow for multiple fasta files to be specified by the user, each containing a piece of the reference genome. Memory is an important consideration in making this change, as the current implementation creates an in-memory dictionary of IDs: sequences for each FASTA entry, which will be intractable for small machines running this code on large genomes.