Closed yarikoptic closed 3 years ago
Oh, I guess I should add to documentation. At large it should be just a reference to http://handbook.datalad.org/ and may be http://handbook.datalad.org/en/latest/code_from_chapters/usecase_ml_code.html and http://handbook.datalad.org/en/latest/beyond_basics/101-168-dvc.html in particular
@yarikoptic Thanks for the pull request! I took a closer look at datalab for the 0.12
release. Unfortunately, the installation is too big for a single tool to be preinstalled inside the main workspace flavor:
conda install (409MB):
apt install (390MB):
A few other options:
# Extend from any of the workspace versions/flavors
FROM mltooling/ml-workspace:0.12.1
RUN \
apt-get update && \
apt-get install -y datalad && \
# Cleanup Layer - removes unneccessary cache files
clean-layer.sh
2. Use an install method with a smaller footprint. However, since most space is used by git-annex, this might not be possible.
3. Add a tool installer script (e.g. see [these scripts](https://github.com/ml-tooling/ml-workspace/tree/main/resources/tools)). However, datalad is quite easy to install, so it might not need an additional installer script.
conda install (409MB)
-- is that in addition to what you already have installed otherwise anyways ( indeed would be too big then :-/) ? majority should be common python packages (besides git-annex itself which is written in haskell but it should not pull too much). I can look into it later to see where the hit comes from and see what could be done.
Yes, it's in addition. But there are some pinned dependencies that are forced to downgrade/upgrade... which might blow up the total required space as well. To see what's actually installed/removed/downgraded, you can just build this Dockerfile:
# Extend from any of the workspace versions/flavors
FROM mltooling/ml-workspace:0.12.1
# Run you customizations, e.g.
RUN \
# Install datalad
apt-get update && \
apt-get install -y datalad && \
# Cleanup Layer - removes unneccessary cache files
clean-layer.sh
This PR is stale because it has been open 90 days with no activity. Remove stale label or comment or this will be closed in 14 days
It will also install git-annex which datalad uses for management of large files.
Since it is just a "tool", I have not used --freeze-installed to benefit from possible upgrades etc.
It could also be installed from stock ubuntu, but would be an older version. Also backports are available from NeuroDebian, but I have decided to just go with conda installation for now.
Closes #50
What kind of change does this PR introduce?
Description:
Checklist: