Open seanrjohnson opened 3 months ago
deduplicate_genbank.py is really slow on nucleotide sequences, relying on cd-hit for clustering. There are probably faster ways to cluster nucleotide sequences that we should look into integrating.
deduplicate_genbank.py is really slow on nucleotide sequences, relying on cd-hit for clustering. There are probably faster ways to cluster nucleotide sequences that we should look into integrating.