I'm struggling to get mashtree to complete running on a dataset of 29611 genomes (371 Archaea, 19238 Bacteria, 10002 Virus) totalling just shy of 80 Gb in sequence data. Running mashtree with 8 threads, on a HPC with the job having 600Gb of RAM allocated, it seems to complete the sketches and distance databasing in a reasonable amount of time (between 24-36 hours), however I've not managed to get it past the following stage:
mashtree: mashDistance: Converting to phylip format into /tmp/MASHTREE.9kWFcb/distances.phylip
For the remaining time (up to the time limit of 96 hours), it can't seem to get past this step. I was wondering if you might have any advice on how to get mashtree working on this dataset, if this is expected behaviour or maybe I need to allocate more resources to the job?
I'm running mashtree v1.2.0, installed on the Linux HPC via cloning the github repo. Any help would be greatly appreciated!
Hi there!
I'm struggling to get mashtree to complete running on a dataset of 29611 genomes (371 Archaea, 19238 Bacteria, 10002 Virus) totalling just shy of 80 Gb in sequence data. Running mashtree with 8 threads, on a HPC with the job having 600Gb of RAM allocated, it seems to complete the sketches and distance databasing in a reasonable amount of time (between 24-36 hours), however I've not managed to get it past the following stage:
mashtree: mashDistance: Converting to phylip format into /tmp/MASHTREE.9kWFcb/distances.phylip
For the remaining time (up to the time limit of 96 hours), it can't seem to get past this step. I was wondering if you might have any advice on how to get mashtree working on this dataset, if this is expected behaviour or maybe I need to allocate more resources to the job?
I'm running mashtree v1.2.0, installed on the Linux HPC via cloning the github repo. Any help would be greatly appreciated!