Open noranekonobokkusu opened 1 year ago
Thanks for getting in touch. The present limit is 2048 leaves (which I'll document); I'm looking into a workaround but it's less straightforward that I'd hoped. I'll post an update once I get somewhere with this.
To calculate distances between trees with <8192 tips, you can now:
remove.packages("TreeDist")
remove.packages("TreeTools")
- Check the console output to be sure that the packages are fully uninstalled.
devtools::install_github("ms609/TreeTools", ref = "more-leaves")
devtools::install_github("ms609/TreeDist", ref = "more-leaves")
-- notinstall.packages("TreeDist")
, which installs pre-compiled binaries that will not link to the customized TreeTools.
Note that distance computation scales with the square of the number of tips. In other words, comparing two 8000 leaf trees will take a couple of minutes.
I've updated the documentation with this information. Please let me know how you get on; I had a bit of trouble getting this running locally, but hopefully the above instructions will avoid these problems.
Hi Martin,
thanks a lot for such a rapid reply! It aborts my RStudio session the moment I run this command now 😅 But I guess that means I did re-installed it successfully and this will work on a computational cluster!
Drat – this is the issue I was running into as well.
My diagnosis was that the crash occurred when the modified TreeTools was reinstalled without uninstalling and re-installing TreeDist. Could you confirm that you uninstalled both packages before installing both from source, using install_github()
?
I'll also be interested to hear whether it runs successfully on a cluster!
I can confirm I did all that. When I try running it from command line, I am getting
> TreeDistance(t_large, t_large) Error: segfault from C stack overflow
Even for two trees with 10 leaves each!
On a cluster, it works with 8GB (which is less than on my laptop) for 8,000 leaves 🤔
Weird – sorry it's not proving straightforward! I've reproduced this issue on a second PC. My suspicion is that this is related to the (un)installation of the packages. I'll investigate.
Okay, I think I've got to the bottom of the issue – which is that the stack overflow error should be taken literally; there is not enough space in the stack to create two SplitList()
objects of the required dimensions, used to compute the distances.
In summary, this means that a significant re-coding will be required for larger trees to be handled – and that the computation for larger trees will be significantly slower (as it will need to make more use of the heap, rather than fast stack memory). That's a bigger job than I am able to attempt right now. Sorry.
More details for my own future reference:
Initially attempted using #define SL_MAX_BINS 128
in TreeTools::SplitList.h
This conclusion was reached by editing cpp_mutual_clustering()
:
cpp_mutual_clustering
to {return <empty list>; const SplitList a(x);
: Succeedsconst SplitList b(y);
: stack overflowsTest performed by running cpp_mutual_clustering(as.Splits(BalancedTree(8)), as.Splits(PectinateTree(8)), 8)
"more-leaves" branches of TreeTools and TreeDist rename packages to BigTreeTools
/ BigTreeDist
some int16s replaced by int32s to allow multiplication in array lookup
What I still don't understand is why it stopped working locally even for two tiny trees with 10 leaves each but actually worked on a cluster for a huge tree.
Thanks a lot for looking into this anyhow!
A fixed amount of memory is allocated as soon as the underlying C++ function is called; because this is allocated on the stack, the amount of memory to allocate is pre-determined and is independent of the variables actually passed. So whatever size of tree is passed, the software requests enough stack memory to compare two 8192-leaf trees.
Differences between a local PC and a cluster will reflect how much memory is available on the stack, which will reflect aspects of memory management that are context-dependent: for instance, I see a crash when using RStudio, but not when running a standalone R session, presumably because Windows allocates memory differently in these contexts.
Hi Martin,
I've been trying to compare two trees with around 5k leaves (both have the same number of leaves) but I couldn't pass the error : This many leaves cannot be supported. Please contact the TreeTools maintainer if you need to use more!
.
I first tried following the above process (uninstalling previous versions) as you suggest but it still gives me the error on my local computer. Then I build a fresh R conda env on a distant server with more resources but I still get the same error.
Any idea what could cause this issue ?
Also, thanks a lot for your tools, they have been very useful so far! Paul.
Glad you have been finding the tools useful, @pterzian. Not clear why you would be seeing the "This many leaves" with ~5000 leaves if you are using the BigTreeTools
and BigTreeDist
packages; maybe worth checking that you are using the functions from these modified packages (which have different names, so need loading with e.g. library("BigTreeDist")
) rather than TreeDist
?
You are absolutely right! I saw the BigTreeTools package was installed. However I don't see any BigTreeDist package, should it be installed along with the devtools::install_github("ms609/TreeDist")
command ?
Checking conda logs :
This command successfully installed BigTreeTools devtools::install_github("ms609/TreeTools", ref = "more-leaves")
building ‘BigTreeTools_1.10.0.tar.gz’
However this command devtools::install_github("ms609/TreeDist")
did not install BigTreeDist :
building ‘TreeDist_2.7.0.tar.gz’
Looks like the ref = "more-leaves"
argument is missing from your second command. (Note updated above.)
Hi, I am trying to measure distances between two trees, and getting this error message:
> TreeDistance(t1, t2) Error: This many leaves cannot be supported. Please contact the TreeTools maintainer if you need to use more!
I tried decreasing the number of tips to 4096 (as mentioned in some part of the TreeDist manual), but I still get this error. Is there a workaround for this, and how much tips are allowed by default? Somehow I cannot find it in the documentation. Thank you!