Open apcamargo opened 3 years ago
I just released version 0.6.0 of taxopy where the only change is that taxids are now encoded as integers instead of strings. The code is now faster and uses less memory.
Before:
== Taxopy data: taxids.n100000.txt elapsed time: 8.591 peak rss: 1090184 b15e76dfe8cd3d7455bcf633909e3e97 taxids.n100000.txt.taxopy.lineage == Taxopy data: taxids.n10000.txt elapsed time: 5.119 peak rss: 1090260 8debf4d37a7997c8ffdc13fd05e5d042 taxids.n10000.txt.taxopy.lineage == Taxopy data: taxids.n1000.txt elapsed time: 5.474 peak rss: 1090236 4f47c764880ca614f9ac67c442f06144 taxids.n1000.txt.taxopy.lineage == Taxopy data: taxids.n100.txt elapsed time: 6.360 peak rss: 1090024 4f7b7f23224e37658171a48780270d90 taxids.n100.txt.taxopy.lineage == Taxopy data: taxids.n10.txt elapsed time: 4.902 peak rss: 1090316 138e7cea6c35a595b6538a34c9d2b7b3 taxids.n10.txt.taxopy.lineage == Taxopy data: taxids.n1.txt elapsed time: 4.921 peak rss: 1090000 c1eda42e466916f0ef566c99c478907a taxids.n1.txt.taxopy.lineage == Taxopy data: taxids.n20000.txt elapsed time: 5.966 peak rss: 1090024 b6ec2a1d717ddcd854c762bd555b03df taxids.n20000.txt.taxopy.lineage == Taxopy data: taxids.n2000.txt elapsed time: 6.667 peak rss: 1090112 3cf4c5b7d13f455ed645654d829fa484 taxids.n2000.txt.taxopy.lineage == Taxopy data: taxids.n40000.txt elapsed time: 6.467 peak rss: 1090300 70ddd9aac0283a4c21800245b582c983 taxids.n40000.txt.taxopy.lineage == Taxopy data: taxids.n4000.txt elapsed time: 5.004 peak rss: 1090120 09e46bef68ac2e532644e5356e7b9928 taxids.n4000.txt.taxopy.lineage == Taxopy data: taxids.n60000.txt elapsed time: 7.177 peak rss: 1090052 26215e6e9a981800565b5de62eb48bda taxids.n60000.txt.taxopy.lineage == Taxopy data: taxids.n6000.txt elapsed time: 5.240 peak rss: 1090260 8da55d3d8e76f548b461dbb5322b1c77 taxids.n6000.txt.taxopy.lineage == Taxopy data: taxids.n80000.txt elapsed time: 7.685 peak rss: 1090220 30d16a8b6ebef3c5ee20bee943981b39 taxids.n80000.txt.taxopy.lineage == Taxopy data: taxids.n8000.txt elapsed time: 5.125 peak rss: 1090064 cfecede52e185ee41336c6c1316e1a4e taxids.n8000.txt.taxopy.lineage
After:
== Taxopy data: taxids.n100000.txt elapsed time: 6.760 peak rss: 867460 b15e76dfe8cd3d7455bcf633909e3e97 taxids.n100000.txt.taxopy.lineage == Taxopy data: taxids.n10000.txt elapsed time: 3.991 peak rss: 867532 8debf4d37a7997c8ffdc13fd05e5d042 taxids.n10000.txt.taxopy.lineage == Taxopy data: taxids.n1000.txt elapsed time: 4.102 peak rss: 867668 4f47c764880ca614f9ac67c442f06144 taxids.n1000.txt.taxopy.lineage == Taxopy data: taxids.n100.txt elapsed time: 3.995 peak rss: 865352 4f7b7f23224e37658171a48780270d90 taxids.n100.txt.taxopy.lineage == Taxopy data: taxids.n10.txt elapsed time: 3.898 peak rss: 853752 138e7cea6c35a595b6538a34c9d2b7b3 taxids.n10.txt.taxopy.lineage == Taxopy data: taxids.n1.txt elapsed time: 3.787 peak rss: 862808 c1eda42e466916f0ef566c99c478907a taxids.n1.txt.taxopy.lineage == Taxopy data: taxids.n20000.txt elapsed time: 4.277 peak rss: 867532 b6ec2a1d717ddcd854c762bd555b03df taxids.n20000.txt.taxopy.lineage == Taxopy data: taxids.n2000.txt elapsed time: 3.892 peak rss: 867624 3cf4c5b7d13f455ed645654d829fa484 taxids.n2000.txt.taxopy.lineage == Taxopy data: taxids.n40000.txt elapsed time: 4.914 peak rss: 867564 70ddd9aac0283a4c21800245b582c983 taxids.n40000.txt.taxopy.lineage == Taxopy data: taxids.n4000.txt elapsed time: 3.889 peak rss: 867280 09e46bef68ac2e532644e5356e7b9928 taxids.n4000.txt.taxopy.lineage == Taxopy data: taxids.n60000.txt elapsed time: 5.625 peak rss: 867564 26215e6e9a981800565b5de62eb48bda taxids.n60000.txt.taxopy.lineage == Taxopy data: taxids.n6000.txt elapsed time: 3.785 peak rss: 867412 8da55d3d8e76f548b461dbb5322b1c77 taxids.n6000.txt.taxopy.lineage == Taxopy data: taxids.n80000.txt elapsed time: 6.216 peak rss: 867372 30d16a8b6ebef3c5ee20bee943981b39 taxids.n80000.txt.taxopy.lineage == Taxopy data: taxids.n8000.txt elapsed time: 3.883 peak rss: 867676 cfecede52e185ee41336c6c1316e1a4e taxids.n8000.txt.taxopy.lineage
I just released version 0.6.0 of taxopy where the only change is that taxids are now encoded as integers instead of strings. The code is now faster and uses less memory.
Before:
After: