lskatz / mashtree

:deciduous_tree: Create a tree using Mash distances
GNU General Public License v3.0
156 stars 24 forks source link

Running mashtree locally with local dependecies #43

Closed mihkelvaher closed 5 years ago

mihkelvaher commented 5 years ago

Due to server restrictions I can't install anything system-wide so to use mashtree I ended up going through the mashtree code and made references to a local dir where mash, quicktree and perl libs are located (also setting Bio::Tree::DistanceFactory->new(-method=>"UPGMA")).

We're currently planning to use mashtree as a part of our software but adding a fixed modified mashtree doesn't look nice. Having the same problem, we don't expect the users to install anything globally.

Should I be looking for other solutions or is it an option that can be added?

lskatz commented 5 years ago

Hey that's great that it will be integrated in your software! Please let me know if/when it comes out in the public so that I can keep track of Mashtree's impact!

The installation should be totally local and so I am not sure which part is global. Could I ask which installation method you used? Is it simply a matter of updating your PATH for Mash and QuickTree? And updating your perl library path?

I think that if you changed the things from NJ to UPGMA, then you will have deviated from the actual algorithm.

mihkelvaher commented 5 years ago

I will let you know when the paper comes out!

I installed mashtree 0.37 a while back and did the modifications. Got it by cloning the repo since cpanm wasn't(isn't) an option. There might be a problem with multithreading but I'll open an issue for it after I've tried the latest version.

Since I've been to dependency hell and back too many times I was hoping to keep it as simple as possible for the end user: ideally just download (clone) and run. I'm currently considering a config file where the dependencies and their paths are listed and my perl script adds them to ENV before calling mashtree. The goal was to have a local single dependencies dir besides my perl script.

Then again, with 4(+) dependencies with mashtree and a couple of other programs needed, the safest (and the nicest) way seems to be a docker container. So I'm at a bit of a loss right now.

Is there a problem with changing NJ to UPGMA? The goal is to have a (huge binary) rooted tree. The approach seems to work with 0.37. Also #28 It just occurred to me that having several species on the same tree (near-root node accuracy is not important) results in very distinct clusters and in theory finding the root should be easy somehow (midpoint rooting?).

lskatz commented 5 years ago

I think the only real issue of changing NJ to UPGMA is that you will be changing how the actual tree is made, and so it would fork away from the Mashtree project. Another issue is that you would turn over the decision for where to root over to the algorithm instead of making an informed choice.

If you don't have cpanm, you can still install mashtree using the alternate method in the documentation, starting with perl -MCPAN -e shell. I know that docker is new and shiny, but cpan is a package manager :)

tseemann commented 5 years ago

@mihkelvaher CPAN is still an option. Perl local::lib is designed for fully local installations of Perl modules in your home directory: https://metacpan.org/pod/local::lib

Packaging systems like brew and bioconda also work fully within your home directory as a regular non-priviliged user. mashtree is available in both those systems: https://bioconda.github.io/user/install.html

lskatz commented 5 years ago

Haven't heard back. Feel free to reopen if this issue is ongoing and if there is more information.