usnistgov / ChebTools

C++ tools for working with Chebyshev expansion interpolants
MIT License
28 stars 8 forks source link

Excessive weight of submodules esp Eigen #23

Closed jamadagni closed 5 years ago

jamadagni commented 5 years ago

The README says to do: git clone --recursive https://github.com/usnistgov/ChebTools but the fact that Eigen is a submodule is a big negative to this!

The above clone command kept failing on my system when trying cloning this submodule because it is a huge repo (see below) and cloning it took a long long time on my village broadband connection and failed half way through (or less) with some obscure TSL broken pipe error or something. Finally I had a friend clone it from a faster connection and send me a ZIP of the repo and included it under the externals/ directory and only then I was able to do python setup.py install.

Then I examined the contents of the cloned ChebTools repo:

$ du -hs chebtools
130M    chebtools
$ cd chebtools
$ du -hs * | sort -h
4.0K    apt.txt
4.0K    CMakeLists.txt
4.0K    environment.yml
4.0K    LICENSE
4.0K    setup.py
4.0K    speed_test.cpp
8.0K    main.cpp
8.0K    README.md
16K     tests
20K     include
20K     scripts
44K     src
104K    Doxyfile
136K    BattlesTrefethen.ipynb
428K    JOSS
110M    externals
$ cd externals/
$ du -hs * | sort -h
2.1M    pybind11
2.3M    Catch
106M    Eigen

You can see that for a total of 130M of the repo, 106M are in the underlying Eigen repo. Nobody really needs the whole of the Eigen history unless they are interested in hacking that!

First of all, it is far easier to have the user install Eigen from their distro's packaging system and the submodule is not necessary. Submodules are a convenience shortcut for managing dependencies yes but when they are so heavy it's not so convenient. Installing libeigen3-dev on my Kubuntu Bionic system entails only 810k of download compared to the 106M above!

So first of all I would recommend to not use the submodule for this purpose at least for Eigen and instead ask the user to install from their distro.

Even if you want to retain the submodule for convenience, at least you can change the README to suggest to shallowly clone the submodules using --shallow-submodules. See the documentation. Doing this makes things much better and I was able to clone on my connection without any problem and the disk usage is also much reduced: 16M vs 106M.

$ du -hs chebtools--shallow-submodules
30M     chebtools--shallow-submodules
$ cd chebtools--shallow-submodules
$ du -hs * | sort -h
4.0K    apt.txt
4.0K    CMakeLists.txt
4.0K    environment.yml
4.0K    LICENSE
4.0K    setup.py
4.0K    speed_test.cpp
8.0K    main.cpp
8.0K    README.md
16K     tests
20K     include
20K     scripts
44K     src
104K    Doxyfile
136K    BattlesTrefethen.ipynb
428K    JOSS
20M     externals
$ cd externals/
$ du -hsc * | sort -h
2.2M    pybind11
2.3M    Catch
16M     Eigen

I confirmed that python setup.py install was successful even with the shallow submodule clone.

ianhbell commented 5 years ago

Hello! To be honest, this is not a problem that I have struggled with before because I am spoiled to be mostly always connected on a fast internet connection. As such, with the exception of when I am on trains and planes, I never worry about download size or speed. But I realize that I am entitled in this regard.

I can’t use the distro-packaged version of Eigen because of windows support (though that is a nicer solution in general). For that reason I always package Eigen, but that has its own problems too because Eigen moved their mirror and broke all the submodule links at one point.

Thank you for bringing this issue with shallow submodules to my attention. In fairness, it is rather annoying to have to wait for Eigen to clone every time. And I will update the docs to indicate this. Or better yet, could I interest you in updating the docs with a pull request? I’d appreciate the help, and it would be a nice update that you are better prepared to make than me.

jamadagni commented 5 years ago

Hello. I made some rather simple changes. Please check.