iqbal-lab-org / pandora

Pan-genome inference and genotyping with long noisy or short accurate reads
MIT License
109 stars 14 forks source link

Make GATB a git submodule instead of downloading it during compilation #264

Closed leoisl closed 3 years ago

leoisl commented 3 years ago

With this, we don't need internet connection to compile pandora (make), and GATB is downloaded when pandora is cloned, not on every build. We still need internet connection to prepare the build (cmake), as Hunter package manager downloads and installs Boost, ZLib, and Gtest, but at least we won't need for the compilation itself (make)

mbhall88 commented 3 years ago

I had looked at doing this previously but struggled to find a good way of freezing it to a particular commit (as we can't use the latest version). I see this is pointing to your fork - did you fork that at the commit/version we need? Does the container build still work with this change?

leoisl commented 3 years ago

We can set a commit/version in a git submodule by going to the submodule dir and doing a git checkout and then pushing your changes: https://stackoverflow.com/a/10916398 . GATB is pointing to commit 75cd492. This commit points to my fork, which is just the previous version we were using (1.4.1) with 3 additional commits that enables GATB to use a custom installed ZLIB (see https://github.com/leoisl/gatb-core/commits/75cd4921dc8a6f556f62baa0f70f60af73b3413e). Without these changes, GATB needs a system-wide installation of ZLIB, but now with the Hunter package manager, we don't need system-wide installations anymore, but I could not make GATB accept a custom installation for ZLIB without changing it...

For the container build, I guess we are talking about this Dockerfile? If yes, then it is working, just built it!

mbhall88 commented 3 years ago

Awesome. Thank you!

mbhall88 commented 3 years ago

Ok, I've just realised there are other issues with making GATB a submodule. Because it is a submodule, it isn't included in the release tarball. So, when trying to install it in the conda-build process (#244) I'm having to clone GATB. Anyone else trying to also build from the release tarball is going to have the same problem.

leoisl commented 3 years ago

Hello, argh, this seems like a github bug (not including submodules in release tarball). People at micropython are just building their release tgz/zip files custom-made so that all source files are actually included in the release. Should we do this? Note that cgranges is also a git submodule, so the compilation of the conda recipe won't work even if boost issues are fixed. Also note that when git cloning GATB in the recipe, we are git cloning the tip of master, but that is not the version we use (see the tagged version in the git repo). Anyway, it might be that the best solution for all these issues is just to do what the devs at micropython did while we still have this issue in github...

mbhall88 commented 3 years ago

argh, this seems like a github bug (not including submodules in release tarball). People at micropython are just building their release tgz/zip files custom-made so that all source files are actually included in the release. Should we do this?

I don't think it's a bug is it? I'm pretty sure it is intended. We can do that if you want. I guess just add it to a release job within travis.

Note that cgranges is also a git submodule, so the compilation of the conda recipe won't work even if boost issues are fixed. Also note that when git cloning GATB in the recipe, we are git cloning the tip of master, but that is not the version we use (see the tagged version in the git repo).

This is why I'm not a big fan of submodules - it is hard to make it clear what commit you require. Building projects through CMake however makes that very easy.

leoisl commented 3 years ago

I don't think it's a bug is it? I'm pretty sure it is intended. We can do that if you want. I guess just add it to a release job within travis.

I don't have a strong opinion, devs at micropython were calling it a github issue

This is why I'm not a big fan of submodules - it is hard to make it clear what commit you require. Building projects through CMake however makes that very easy.

The git submodule commit does not get explicitly set in the .gitsubmodule file, but it is explicit set when you look at the submodule through git command-line or web interface. A commit hash is actually set, I don't think we can set a branch/tag. I actually have a preference for git submodules in the sense that when we get the code with git clone, all the main and dependencies code are downloaded, and the dependencies are git repos themselves, making it easier for us to work with it. It also gets indexed by the IDE, and etc (i.e. dependency code becomes first-class code). But it seems this also brings some complications... I am fine with downloading dependencies in the cmake step, as Hunter does with Boost, Gtest and ZLIB, although this means that everytime we do a new build, we have to download all the dependencies again... I just changed GATB to submodule because to compile pandora (make step) we actually need internet connection because download of GATB happens in the make step. It is a bit annoying that we need internet to clone, to prepare build (cmake) and to compile (make), but this was a fix I think particularly for my 12-hour flight, where I intended to work half of it, and was wanting to be able to compile :p

If you think we should revert back to downloading GATB at the make step, then we can do it

mbhall88 commented 3 years ago

We can keep it as it is, but can you please add in a step to the CI for releases that the tarball includes the submodules? This will make the bioconda stuff a bit easier and also make it easier for any users downloading the release tarball and trying to build from source.

leoisl commented 3 years ago

Sorry, not sure I understand your request. Would it be adding a stage in .travis.yml to build a tarball with all the source files, including submodules source files, or would it be automatise the process of creating this tarball with e.g. github actions, so that everytime we create a tag, the correct tarball is created?

mbhall88 commented 3 years ago

I don't really mind which way we do it, it would just be good to have it automated so that whenever we tag a release, the release tarball has all the files needed to build pandora.

leoisl commented 3 years ago

Moved to https://github.com/rmcolq/pandora/issues/269