rotary-genomics / spokewrench

Toolkit for manipulating circular DNA sequence elements
BSD 3-Clause "New" or "Revised" License
0 stars 0 forks source link

Replace Biopython with ScikitBio? #4

Open LeeBergstrand opened 4 months ago

LeeBergstrand commented 4 months ago

When I was at Waterloo, I moved my code from BioPython to scikit-bio. I found it much more performant because it uses C-based data structures like Numpy under the hood (the same techniques as pandas) instead of raw Python objects like BioPython. So, it runs faster and takes much less memory. I used it for micromeda and pygenprop to extract sequences from Fasta files and write new Fasta files.

People also find the code is much more stable: https://www.reddit.com/r/bioinformatics/comments/75xugl/scikitbio_why_does_it_exist/

@jmtsuji Do you have any interest in moving your code to use Scikit-Bio? I think it should be quite easy to port over.

LeeBergstrand commented 4 months ago

There should be some of my code that uses it here: https://github.com/Micromeda/pygenprop/tree/master

LeeBergstrand commented 4 months ago

@jmtsuji Thoughts?

jmtsuji commented 4 months ago

@LeeBergstrand Thanks for this suggestion! I've used scikit-bio before for multivariate stats (e.g., PCoA), but I didn't realize that it had sequence manipulation built in as well. I'll take a look to see if this could replace biopython in this repo.