karel-brinda / MiniPhy

Phylogenetic compression of extremely large genome collections [661k β†˜πŸ­πŸ²π—šπ—Άπ—• | BIGSIdata β†˜πŸ°πŸ΄π—šπ—Άπ—• | AllTheBact'23 β†˜πŸ³πŸ±π—šπ—Άπ—•]
https://brinda.eu/mof
Other
19 stars 0 forks source link

Add script for batching #75

Closed karel-brinda closed 8 months ago

karel-brinda commented 11 months ago

closes #59

$ ./create_batches.py ~/github/my/mof-experiments/experiments/60_661k_main_table/661k_main_table.tsv.xz -s hit1_species -f asm_path
Loaded 661404 genomes across 2600 species clusters
Put 21912 genomes of 2458 species into the dustbin
Created 305 batches of 143 pseudoclusters
Finished

I.e., near identical to the batching we've done previously (there're minor differences due to different strategies to name cleaning)