ml-tooling / ml-workspace

🛠 All-in-one web-based IDE specialized for machine learning and data science.
https://mltooling.org/ml-workspace
Apache License 2.0
3.45k stars 452 forks source link

ENH: Add datalad installation via conda #58

Closed yarikoptic closed 3 years ago

yarikoptic commented 3 years ago

It will also install git-annex which datalad uses for management of large files.

Since it is just a "tool", I have not used --freeze-installed to benefit from possible upgrades etc.

It could also be installed from stock ubuntu, but would be an older version. Also backports are available from NeuroDebian, but I have decided to just go with conda installation for now.

Closes #50

What kind of change does this PR introduce?

Description:

Checklist:

yarikoptic commented 3 years ago

Oh, I guess I should add to documentation. At large it should be just a reference to http://handbook.datalad.org/ and may be http://handbook.datalad.org/en/latest/code_from_chapters/usecase_ml_code.html and http://handbook.datalad.org/en/latest/beyond_basics/101-168-dvc.html in particular

lukasmasuch commented 3 years ago

@yarikoptic Thanks for the pull request! I took a closer look at datalab for the 0.12 release. Unfortunately, the installation is too big for a single tool to be preinstalled inside the main workspace flavor:

A few other options:

  1. Build and publish your own datalad flavor:
    
    # Extend from any of the workspace versions/flavors
    FROM mltooling/ml-workspace:0.12.1

Run you customizations, e.g.

RUN \

Install datalad

apt-get update && \
apt-get install -y datalad && \
# Cleanup Layer - removes unneccessary cache files
clean-layer.sh

2. Use an install method with a smaller footprint. However, since most space is used by git-annex, this might not be possible.
3. Add a tool installer script (e.g. see [these scripts](https://github.com/ml-tooling/ml-workspace/tree/main/resources/tools)). However, datalad is quite easy to install, so it might not need an additional installer script. 
yarikoptic commented 3 years ago

conda install (409MB) -- is that in addition to what you already have installed otherwise anyways ( indeed would be too big then :-/) ? majority should be common python packages (besides git-annex itself which is written in haskell but it should not pull too much). I can look into it later to see where the hit comes from and see what could be done.

lukasmasuch commented 3 years ago

Yes, it's in addition. But there are some pinned dependencies that are forced to downgrade/upgrade... which might blow up the total required space as well. To see what's actually installed/removed/downgraded, you can just build this Dockerfile:

# Extend from any of the workspace versions/flavors
FROM mltooling/ml-workspace:0.12.1

# Run you customizations, e.g.
RUN \
    # Install datalad
    apt-get update && \
    apt-get install -y datalad && \
    # Cleanup Layer - removes unneccessary cache files
    clean-layer.sh
github-actions[bot] commented 3 years ago

This PR is stale because it has been open 90 days with no activity. Remove stale label or comment or this will be closed in 14 days