janpfeifer / gonb

GoNB, a Go Notebook Kernel for Jupyter
https://github.com/janpfeifer/gonb
MIT License
655 stars 38 forks source link

Use case and question about adding OpenSSH to the docker image. #138

Closed oderwat closed 2 weeks ago

oderwat commented 2 weeks ago

Hi @janpfeifer,

TL;DR: Would you consider adding OpenSSH to the Docker image to simplify the setup for private repositories?

Full Story:

I've been using the original gonb Docker image to create a 'gonb'-based hub with shared code and services for experimenting, sharing, and explaining code among colleagues.

This setup is essentially a Docker Compose configuration with multiple gonb containers (one for each user) that share some directories containing notebooks. It runs on one of our high-performance servers and also provides several services (like MariaDB, PostgreSQL, MSSQL, NATS, and Clickhouse) which we use for development.

I even wrote a crude REST API mirroring tool. Using this, I can share API services from any machine to another using NATS. In this case, I use it to share AI services from my Windows machine (running Ollama, SD-WebUI, Coqui TTS) with the gonb hub. To accomplish this, I run a server (written in Go) inside WSL 2 on the Windows RTX 3090 Ti machine and a Docker container for the API endpoint in the hub. This works surprisingly well given the minimal effort I've invested so far. It even runs Jupyter, although it's missing web sockets and currently doesn't support requests larger than the NATS message limit (2 MB in this case).

To integrate our private repositories, I created a shared OpenSSH token and added it as to a special gonb user in GITEA. I then used shared paths to add the necessary SSH and Git configuration files for private repo access into the original container. However, I encountered a problem: there is no ssh in the gonb container, and adding ssh from the host doesn't work due to glibc incompatibilities. I found a statically linked OpenSSH binary and currently add the ssh command from there into the container.

I would prefer if the original gonb image had OpenSSH installed. Perhaps it could even set up private repository access when provided with certain environment variables. It could create the necessary files and run a key scan for the specified hosts.

Thank you for considering this suggestion!

janpfeifer commented 2 weeks ago

hey Hans, this sounds a reasonable tool to add to the Dockerfile -- as gonb usually is used with something else folks are using it with, and ssh is often a tool to connect to many types of "something else".

Just to clarify:

  1. You mean adding the openssh-client package, right ? (and not add a ssh server in the docker)
  2. It sounds reasonable to add the ssh-client to the docker. But just to double check: you could create a Dockerfile that inherits from the GoNB's one, and just add a few lines (basically a RUN apt install openssh-client), and create your own free account in Dockerhub, to upload your version. With the extra maintenance cost of having to re-run it every time there is a new GoNB release. That is not ideal for you, right ?
oderwat commented 2 weeks ago
  1. Yes the client (ssh, scp and ssh-keyscan)
  2. We could just use a docker file build in the compose setup (no need for another container repository). I just thought that the use case of accessing private repositories could be part of the docker file gonb supplies.

With ssh available one could even write a helper that setups up a private repository from a notebook. And you could set up port forwardings or fetch files. I think it could also be benefitting if rsync curl and maybe also wget are available.

Another possibility would be to add jovyan to the sudoers (without password). But this would promote the user to satanic levels :)

janpfeifer commented 2 weeks ago

Actually, wget is already included. And it makes sense to add ssh, curl and rsync -- anything that helps one to use the docker as is.

Now about adding jovyan to /etc/sudoers: I'm not against it, since the docker always run in a container, which is presumably is sandboxed. I mean, escalating from jovyan to root doesn't gain an attacker much in terms of how it can influence the world outside the container. Or am I wrong ? I have the feeling not giving jovyan sudo powers is just an annoyance to authentic users, and doesn't hinders attackers in any way. WDYT ?

Let me put together a PR.

oderwat commented 2 weeks ago

I also think that you can add jovyan but then (at last) a user can destroy the image if not careful. But I don't care as it can easily be rebuild.

janpfeifer commented 2 weeks ago

I was thinking that when using Google Colab (colab.research.google) I was always able to !apt install ... stuff -- and I recall that was important.

I just checked in Colab and it runs in root by default.

So I'll follow suit and add jovyan to /etc/sudoers.

Yes, the original image is not destroyed in Docker, when it runs the image in a container, it forks it for the container (I think).

Sry, I think I won't have the time to rebuild/test/deploy new docker tonight. But first thing tomorrow.

janpfeifer commented 2 weeks ago

What do you think of PR #140 ?

I compromised by giving sudo privileges only to apt update and apt install *, to allow users to install arbitrary official packages (they are not able to change the apt sources presumably).

Something I was considering is if someone wants to include a library that users CGO, they will need to install gcc in the docker. But I didn't want to install it by default.

oderwat commented 2 weeks ago

LGTM

One more thing: What if you add the possibility to execute a script when the container starts?

I think this could be done by adding an entrypoint.sh script that could check for a file like /autostart.sh and if it is available to run it will be run before the actual tini call.

This way one could add more stuff to the container on startup (I would install some of our internal tooling for example). This could also be used to adjust the UID/GID of the user or do the private repo setup and other stuff.

janpfeifer commented 2 weeks ago

And the script would be mounted from the host mounted directory ? Should it be run as root ?

oderwat commented 2 weeks ago

And the script would be mounted from the host mounted directory ? Should it be run as root ?

Yes, that is what I imagine.

janpfeifer commented 2 weeks ago

Took a little fiddling around but pls check it out:

  1. PR #140 updated, including instructions in the README.md file.
  2. I pushed the docker janpfeifer/gonb_jupyterlab:v0.10.5-20241015 with the autostart.sh support.

Would you double check it works for you ? I tested here, and it seems to be working ... but let me make sure it works for your use case.

cheers

oderwat commented 2 weeks ago

I will check that out asap.

oderwat commented 2 weeks ago

I just tried the new docker image. At first, I ran into problems because I had the container run with user: 1000:100 to have the correct access rights for the volumes.

I think that is not really needed though. But one needs to watch out that all shared directories and files which docker (or the startup script) create have working access rights for jovyan (see below).

The second difficulty arises when using autostart.sh with go install .... One will run into a failure when you later try to import some stuff from inside the notebook. Go complains about access violations because of GOMODCACHE is partially created by root with (umask 022) in that case.

There are different work around solutions I considered:

In tested all three and ended up updating the access rights (other stuff failed on me)

This is my latest autostart.sh. It took some time to get the locale stuff working.

echo "Configuring system..."

apt-get update
# I want vim
apt-get install -y vim
# set German timezone (so time.Now() returns German time)
apt-get install -y tzdata
ln -sf /usr/share/zoneinfo/Europe/Berlin /etc/localtime
# some locale magic to make "date" answer with German format
echo 'de_DE.UTF-8 UTF-8' >> /etc/locale.gen
locale-gen
echo 'LC_ALL="de_DE.utf8"' > /etc/default/locale
export LC_ALL="de_DE.UTF-8"
dpkg-reconfigure locales
# check if it worked
date

# installing Go tools
go install github.com/nats-io/natscli/nats@latest
chown -R jovyan:users /opt/go
janpfeifer commented 2 weeks ago

Yes, the user that will run Jupyter is configured as $NB_USER (the variables is exported) == "jovyan" -- this is part of the JupyterLab docker (jupyter/base-notebook) on which this one is based.

Hmm, that situation in Go is wrong. What happens is that I set the $GOPATH in the Dockerfile. The root user should have its own $GOPATH -- so what is installed by root is owned by root, and what is installed by . I'll fix that.

Another problem is that I have to manually export stuff when running as "jovyan". The line used to run JupyterLab is:

su --preserve-environment  $NB_USER -c "export PATH=${PATH} ; jupyter lab"

I'll try to create a .profile (or .bashrc) for "jovyan" that sets all the variables, so the standard su -l jovyan will do the trick.

janpfeifer commented 2 weeks ago

Ok, I think I got this working.

Here how your updated autostart.sh should look like -- I hope you don't mind, I added it as an example in the documentation:

#!/bin/bash

echo "Configuring system..."

#apt-get update
# I want vim
#apt-get install -y vim
# set German timezone (so time.Now() returns German time)
apt-get install -y tzdata
ln -sf /usr/share/zoneinfo/Europe/Berlin /etc/localtime

# some locale magic to make "date" answer with German format
echo 'de_DE.UTF-8 UTF-8' >> /etc/locale.gen
locale-gen
echo 'LC_ALL="de_DE.utf8"' > /etc/default/locale
export LC_ALL="de_DE.UTF-8"
dpkg-reconfigure locales

# check if it worked
date

# Installing Go tools for $NB_USER
su -l "$NB_USER" -c "go install github.com/nats-io/natscli/nats@latest"

This is the result in the notebook:

image

janpfeifer commented 2 weeks ago

Latest version uploaded again to janpfeifer/gonb_jupyterlab:v0.10.5-20241015 if you want to try it out ?

Btw, thanks for checking it.

oderwat commented 2 weeks ago

I tried it and it works excellent!

janpfeifer commented 2 weeks ago

Nice, closing this one. After the other features are in I'll cut a new release.

oderwat commented 2 weeks ago

A quick question: Wouldn't it be better to add tzdata in the image already? Go uses the timezone data of the system afaik:

time.Local, err = time.LoadLocation("US/Pacific") // <- this only works with tzdata installed
if err != nil {
    fmt.Printf("Error: %v\n",err)
}
now = time.Now()
janpfeifer commented 2 weeks ago

Oh, I was not aware of it. Yes, let me add it to the Dockerfile.

janpfeifer commented 2 weeks ago

Done in #142

janpfeifer commented 2 weeks ago

Included in the v0.10.6 release. Docker also already available.