Open DorothyTamYiLing opened 1 year ago
Hi Dorothy,
The runtime of snp-dists scales quadratically with the input. Say c
is the time for a single pairwise comparison. snpsdist makes O(n^2) comparisons. Hence for your sample the time is 3745^2 * c. If c
is 10ms that still is 38 hours! In order to get a good estimate for c
I recommend you run the analysis on just 37 samples. Multiply the resulting time by 10'000 and you get the runtime for the whole dataset.
If the resulting estimate is way too large you can compute approximate solutions using mash or phylonium.
Hope this helps, Fabian
Hi Fabian,
Thanks for the useful tips! I will give the calculation a go and maybe try to reduce the sample set too.
Thanks, Dorothy
Hi Teesmann,
First of all, thanks for writing this piece of software.
I am trying to run SNP-dists on a large sample set (3745 samples alignment, each with 4988504b). It has been running for more than 24 hours and I wonder if that is normal. How much time do you think it will take to finish for an input of this size? I have stopped the running now as I would like to get a rough estimate of the run time.
Thanks, Dorothy