google / caliban

Research workflows made easy, locally and in the Cloud.
https://caliban.readthedocs.io
Apache License 2.0
495 stars 67 forks source link

Begin to support Julia #38

Open eschnett opened 4 years ago

eschnett commented 4 years ago

After discussing with @sritchie yesterday I looked into supporting Julia as language. I found the following:

It is straightforward to install Julia into the directory that is packed up by Docker. Since a Julia install has many files, this is too slow.

I instead opted to install Julia in a similar way as Python. There is no good Ubuntu package for Julia; the available ones are very outdated. Instead, Julia needs to be installed by downloading a binary. Julia has a built-in package manager, and a simple Julia command will then install all package dependencies. In the long run, it might be nice to base the Docker template onto a container that already has Julia installed, but that doesn't seem necessary at the moment where I'm just experimenting with Julia.

The enclosed "pull request" should make it easy to see what I changed. I decided to pass a parameter julia_url in the same was as e.g. requirements_path and handle it in a similar way. julia_url is the URL where an architecture-specific binary of Julia is available. This seems to work nicely.

My question is: What is a good way for the user to definejulia_url and pass it to _dependency_entries? I'm still in a phase where I need to find my way around the source code and configuration files.

I would, of course, also be happy about other comments about my approach.

googlebot commented 4 years ago

Thanks for your pull request. It looks like this may be your first contribution to a Google open source project (if not, look below for help). Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

:memo: Please visit https://cla.developers.google.com/ to sign.

Once you've signed (or fixed any issues), please reply here with @googlebot I signed it! and we'll verify it.


What to do if you already signed the CLA

Individual signers
Corporate signers

ℹ️ Googlers: Go here for more info.

eschnett commented 4 years ago

@googlebot I signed it!

googlebot commented 4 years ago

CLAs look good, thanks!

ℹ️ Googlers: Go here for more info.

sritchie commented 4 years ago

@eschnett , thank you for this! I should have some time tonight to take a look and give feedback on how we can get this integrated.

codecov[bot] commented 4 years ago

Codecov Report

Merging #38 into master will increase coverage by 1.14%. The diff coverage is 13.33%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master      #38      +/-   ##
==========================================
+ Coverage   54.31%   55.46%   +1.14%     
==========================================
  Files          31       31              
  Lines        3183     3195      +12     
==========================================
+ Hits         1729     1772      +43     
+ Misses       1454     1423      -31     
Impacted Files Coverage Δ
caliban/docker/build.py 31.44% <13.33%> (-1.27%) :arrow_down:
caliban/util/metrics.py 100.00% <0.00%> (ø)
caliban/config/__init__.py 100.00% <0.00%> (ø)
caliban/history/cli.py 13.40% <0.00%> (+0.06%) :arrow_up:
caliban/resources/caliban_launcher.py 100.00% <0.00%> (+66.15%) :arrow_up:

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update 7e175c2...c5eb7e9. Read the comment docs.

eschnett commented 4 years ago

I'm also working on a Slurm backend, i.e. connecting to a Slurm scheduler on a remote system via ssh. To simplify things for me while I develop, this is now living on the same branch. Since it affects different files (platform/slurm) I hope it doesn't make it more difficult to see what I changed. I'll follow your submission guidelines when/if this is ready to be included.

sritchie commented 4 years ago

hey @eschnett , I was thinking about Julia support - I think I can take your example and build you a docker base image with Julia in it, and maybe add a "julia" flag to the calibanconfig to privilege this container in the way that the recent #39 gives some special syntax for these deep learning VMs.

I THINK that if you don't have requirements.txt or setup.py, we won't do any python-specific things when you run caliban commands.

I don't know Julia yet, but based on https://julialang.github.io/Pkg.jl/v1/toml-files/

It looks like the presence of a Project.toml (or Manifest.toml?) is what we need to look for, and trigger julia --eval 'using Pkg; Pkg.instantiate()'" if it exists. Is that right? Would that be enough (along with the julia installation) to get someone a working arena?

Then, the current code should work if you use caliban run shell_script.sh, but I can update the code here to detect a .jl file and issue the appropriate equivalent to python -m.

eschnett commented 4 years ago

@sritchie This looks approximately correct. A few details:

eschnett commented 4 years ago

Regarding python -m: It's less common to have sub-packages in Julia than having modules in Python. It would make sense to accept the name of a function, possibly qualified with a package name, and look for that function in the current package. The respective Julia code would be similar to (untested)

julia -e 'using {package}; {function}()'
sritchie commented 4 years ago

@eschnett , I've just released 0.3.0 at https://github.com/google/caliban/releases/tag/0.3.0 with some of the work you've been doing. Thank you! I put a julia_version key into calibanconfig.json that we don't yet use; I'm going to take a look now and see if we can get this in ASAP so folks can start using Julia.

I take it you've been developing with this feature. Anything missing for Julia support that you've noticed, or is this it?

Thank you again!

eschnett commented 4 years ago

Sam

Thanks!

Yes, I found two small things missing:

I will have a look at the new Caliban and update/simplify my Julia branch.

-erik

FYI: My current Julia build instructions are (these are in the git repo)

if julia_url is not None:

Install Julia

ret += f"""

Where Julia is installed

TODO: Use /usr/local instead of /tmp for Julia directories

ENV JULIA_LOC=/tmp/julia ENV PATH=$PATH:$JULIA_LOC/bin

Where Julia installs packages

ENV JULIA_DEPOT_PATH=/tmp/var/julia_depot

The Python version used by Julia: Julia uses its own Conda environment

ENV PYTHON= RUN /bin/bash -c "mkdir -p $JULIA_LOC && \ wget -nv {julia_url} && \ tar xzf $(basename {julia_url}) -C $JULIA_LOC --strip-components 1 && \ rm -f $(basename {julia_url}) && \ mkdir -p $JULIA_DEPOT_PATH && \ mkdir -p $JULIA_DEPOT_PATH/servers && \ echo 'telemetry = false' > $JULIA_DEPOT_PATH/servers/telemetry.toml" """

# Install dependencies for the current Julia project
if os.path.exists("Project.toml"):
  ret += f"""

The current Julia project

ENV JULIA_PROJECT={workdir} COPY --chown={user_id}:{user_group} *.toml {workdir}/ RUN /bin/bash -c "julia --eval 'using Pkg; Pkg.instantiate()'" """

-erik

On Fri, Jul 24, 2020 at 10:28 AM Sam Ritchie notifications@github.com wrote:

@eschnett https://github.com/eschnett , I've just released 0.3.0 at https://github.com/google/caliban/releases/tag/0.3.0 with some of the work you've been doing. Thank you! I put a julia_version key into calibanconfig.json that we don't yet use; I'm going to take a look now and see if we can get this in ASAP so folks can start using Julia.

I take it you've been developing with this feature. Anything missing for Julia support that you've noticed, or is this it?

Thank you again!

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/google/caliban/pull/38#issuecomment-663566519, or unsubscribe https://github.com/notifications/unsubscribe-auth/AABUECXDXHAGVS3XAQVOHEDR5GLCHANCNFSM4O2UFE5A .

-- Erik Schnetter schnetter@gmail.com http://www.perimeterinstitute.ca/personal/eschnetter/

sritchie commented 4 years ago

Interesting; I wonder, is it actually a GOOD thing if the Julia installation shares a conda env with the python scripts? Then someone could, for example, use a project that has a bunch of their own custom scripts, and call those (which require dependencies) from Julia.

There may be some subtlety here that I'm not reading correctly, of course.

Another question. If the conda environment is activated in the base container, is "system Python", from Julia's perspective, actually the Conda env that we've already set up?

eschnett commented 4 years ago

On Fri, Jul 24, 2020 at 10:54 AM Sam Ritchie notifications@github.com wrote:

Interesting; I wonder, is it actually a GOOD thing if the Julia installation shares a conda env with the python scripts? Then someone could, for example, use a project that has a bunch of their own custom scripts, and call those (which require dependencies) from Julia.

There may be some subtlety here that I'm not reading correctly, of course.

The problem is that Julia cannot install Python packages, even if they are required. One would have to list them explicitly for Caliban, which duplicates information already present in the transitive dependencies in Project.toml. I don't know whether Julia could technically install them, but it won't, unless it runs its own Conda Python. I think this is just Julia being carefuI. If Julia isn't in control, then it doesn't know which mechanism to use (apt? pip? conda?), so it reports an error if Python packages are missing.

In this case, since the Caliban Python isn't used by anything else but Julia, it would make sense to use a single environment. But that's a low priority for me since the current setup works, at the expense of a bit of disk space.

Another question. If the conda environment is activated in the base container, is "system Python", from Julia's perspective, actually the Conda env that we've already set up?

I think I saw /opt/... in the path in Julia's error message, which sounds as if it was the environment set up by Caliban.

-erik

-- Erik Schnetter schnetter@gmail.com http://www.perimeterinstitute.ca/personal/eschnetter/

eschnett commented 4 years ago

Sam

I just updated the eschnett/julia branch to work with the latest master.

-erik

On Fri, Jul 24, 2020 at 2:35 PM Erik Schnetter schnetter@gmail.com wrote:

On Fri, Jul 24, 2020 at 10:54 AM Sam Ritchie notifications@github.com wrote:

Interesting; I wonder, is it actually a GOOD thing if the Julia installation shares a conda env with the python scripts? Then someone could, for example, use a project that has a bunch of their own custom scripts, and call those (which require dependencies) from Julia.

There may be some subtlety here that I'm not reading correctly, of course.

The problem is that Julia cannot install Python packages, even if they are required. One would have to list them explicitly for Caliban, which duplicates information already present in the transitive dependencies in Project.toml. I don't know whether Julia could technically install them, but it won't, unless it runs its own Conda Python. I think this is just Julia being carefuI. If Julia isn't in control, then it doesn't know which mechanism to use (apt? pip? conda?), so it reports an error if Python packages are missing.

In this case, since the Caliban Python isn't used by anything else but Julia, it would make sense to use a single environment. But that's a low priority for me since the current setup works, at the expense of a bit of disk space.

Another question. If the conda environment is activated in the base container, is "system Python", from Julia's perspective, actually the Conda env that we've already set up?

I think I saw /opt/... in the path in Julia's error message, which sounds as if it was the environment set up by Caliban.

-erik

-- Erik Schnetter schnetter@gmail.com http://www.perimeterinstitute.ca/personal/eschnetter/

-- Erik Schnetter schnetter@gmail.com http://www.perimeterinstitute.ca/personal/eschnetter/

eschnett commented 4 years ago

@sritchie How should we continue to work on this pull request? It's in a good state, and I've been using this for my Julia development for a while. Is there something missing or something that should be cleaned up?