molbiodiv / bcdatabaser

A pipeline to create reference databases for arbitrary markers and taxonomic groups from NCBI data
https://bcdatabaser.molecular.eco
MIT License
6 stars 3 forks source link

max seq per taxon #19

Closed chiras closed 5 years ago

chiras commented 5 years ago

Set an option for maximum number of sequences for an individual taxon. Set default 3

iimog commented 5 years ago

I'll take care of this. As eutils also return the Slen field we can use that to sort accessions of each taxid by length and take the $max_num_seqs_by_taxon longest.

iimog commented 5 years ago

Implemented in master as --sequences-per-taxon with default value 3