jupyterhub / the-littlest-jupyterhub

Simple JupyterHub distribution for 1-100 users on a single server
https://tljh.jupyter.org
BSD 3-Clause "New" or "Revised" License
1.04k stars 341 forks source link

Supporting ARM architectures #62

Closed gedankenstuecke closed 3 years ago

gedankenstuecke commented 6 years ago

It would be 💯 to get TLJH running on Raspberry Pis and other small boards. Unfortunately miniconda doesn't support their ARM processor architectures, so some changes to the installer are needed.

As discussed with @yuvipanda a fix for this would be using virtualenv instead of conda. The user environment should be configurable to use either conda or virtualenv. Furthermore nodesource also supports ARM architectures.

yuvipanda commented 6 years ago

Thank you for testing and reporting this early, @gedankenstuecke!

Our plans now are:

  1. Require Ubuntu 18.04 as minimum, since system python here is 3.6
  2. Use nodesource + venv for the hub environment unconditionally.
  3. Use conda as default user environment in everything except ARM, where venv is user environment. This is not going to be user toggleable - we'll switch based on architecture.
  4. Integration test ARM with qemu. This is gonna be tricky but absolutely necessary. We probably won't run unit tests in it.
  5. Use only pip for installing packages into the user environment / hub environment in the default install. This reduces split between ARM & x86 environments.

I think this covers it.

gedankenstuecke commented 6 years ago

I looked into how to get docker to use the qemu integration and it seems like it's not too hard to pull off as there's already some images for that.

To get qemu set up you can run

docker run --rm --privileged multiarch/qemu-user-static:register --reset

and that's the only real trick to it. From there on you can use a base image for raspbian, e.g. resin/rpi-raspbian. For a test I ran a simple Dockerfile on my end:

# Pull base image
ARG distro=stretch
FROM resin/rpi-raspbian:$distro

RUN apt-get update && apt-get install -y python3 python3-pip

CMD ["python3", "--version"]

After

docker build -t test/armtest .
docker run test/armtest

this yields Python 3.5.3.

yuvipanda commented 6 years ago

Awesome, @gedankenstuecke! Can you write up a small script similar to https://github.com/jupyterhub/the-littlest-jupyterhub/blob/master/.circleci/integration-test.py that helps run TLJH inside a qemu / arm container?

gedankenstuecke commented 6 years ago

Yeah, I tried adapting the whole thing here with a custom Dockerfile and integration-test.py: https://github.com/gedankenstuecke/the-littlest-jupyterhub/commit/de1c7ce2a3df5aa53c0120f645a8eb90c95c0d1a

The building of the image seems to work out fine, but then when trying to start the container it dies right away and I couldn't yet figure out what's going wrong here.

yuvipanda commented 5 years ago

Things that need to happen here:

  1. Allow switching between conda & venv for user environment at install time
  2. Add tests for both
  3. Add tests for the littlest jupyterhub in debian stretch images
  4. Add tests (with QEMU) on ARM architectures
scparker commented 5 years ago

Sounds like a good JupyterCon paper! I'll check it out later today...

pisymbol commented 4 years ago

Is this dead?

yuvipanda commented 4 years ago

@pisymbol nobody is currently working on it, unfortunately :( The core set of tasks needed haven't changed though.

cdibble commented 3 years ago

I want to throw in a +1 for this ticket. I'd love if there was ARM support for TLJH.

yuvipanda commented 3 years ago

@cdibble may I ask what you are planning on running this on? Raspberry PI?

cdibble commented 3 years ago

@cdibble may I ask what you are planning on running this on? Raspberry PI?

@yuvipanda - Actually I am just interested in taking advantage of the price:performance on the latest generation of AWS servers- the EC2 instances on ARM have pretty attractive specs compared with the previous generations on x64. So it's just about upgrading for me, not a use case with a mandatory ARM architecture. I understand this may not be the most motivating use case for dev work on this ticket.

TLJH has made the deployment and maintenance of a Jupyter hub server a dream- many thanks to you and the other contributors.

yuvipanda commented 3 years ago

That's actually more motivating than Raspberry PIs - RPIs are not powerful enough for most hub use cases.

ARM migration should be easier now, since we use miniforge, which does have arm64 support.

Am very glad you found it useful, @cdibble!

yuvipanda commented 3 years ago

At least with docker on mac, you can trivially run arm64 builds. This should make testing much easier!

cdibble commented 3 years ago

ARM migration should be easier now, since we use miniforge, which does have arm64 support.

This is a good hint. Thank you. FWIW I do see a note in tljh/installer.py line 183 to add support for miniforge. Is there a branch where that's implemented? I've forked to see if I can get it to work. It seems like just a matter of modifying the tljh/conda.py functions related to installing and checking packages with conda. Any input welcome.

yuvipanda commented 3 years ago

@GeorgianaElena did some work on it a few months ago, maybe we can split out the miniforge commits from there?

@cdibble I started a test run of TLJH setup with ARM, via this PR: https://github.com/jupyterhub/the-littlest-jupyterhub/pull/674. It's compiling so manythings, and the emulation is so slow - I'm still at the point where setup.py dependencies are being installed. We somehow require grpc in our base install - not sure why?!

yuvipanda commented 3 years ago

Can't install grpcio on arm :( I filed https://github.com/jupyterhub/traefik-proxy/issues/125

cdibble commented 3 years ago

Seems we got to similar points. I can't start the service:

Mar 27 21:45:59 ip-10-13-7-223 sudo[14971]:   ubuntu : TTY=pts/0 ; PWD=/home/ubuntu/the-littlest-jupyterhub/bootstrap ; USER=root ; COMMAND=/bin/systemctl restart jupyterhub.se
Mar 27 21:45:59 ip-10-13-7-223 sudo[14971]: pam_unix(sudo:session): session opened for user root by ubuntu(uid=0)
Mar 27 21:45:59 ip-10-13-7-223 systemd[1]: traefik.service: Start request repeated too quickly.
Mar 27 21:45:59 ip-10-13-7-223 systemd[1]: traefik.service: Failed with result 'exit-code'.
Mar 27 21:45:59 ip-10-13-7-223 systemd[1]: Failed to start traefik.service.
-- Subject: Unit traefik.service has failed
-- Defined-By: systemd
-- Support: http://www.ubuntu.com/support
--
-- Unit traefik.service has failed.
--
-- The result is RESULT.
Mar 27 21:45:59 ip-10-13-7-223 systemd[1]: Dependency failed for jupyterhub.service.
-- Subject: Unit jupyterhub.service has failed
-- Defined-By: systemd
-- Support: http://www.ubuntu.com/support
--
-- Unit jupyterhub.service has failed.
--
-- The result is RESULT.
Mar 27 21:45:59 ip-10-13-7-223 systemd[1]: jupyterhub.service: Job jupyterhub.service/start failed with result 'dependency'.
Mar 27 21:45:59 ip-10-13-7-223 sudo[14971]: pam_unix(sudo:session): session closed for user root

Though I can't find the reference to the dependency that caused the failure on my end. I'm not seeing the install of grpcio or etcd3. Where is that happening?

I just changed a few things in conda.py to point to miniforge to get to this stage. Hopefully if we can make some progress on that new issue you filed, it will fall into place.

cdibble commented 3 years ago

BTW, I had an EC2 instance with ARM running, so I skipped the docker dev environment setup.

yuvipanda commented 3 years ago

Hah, you have definitely gotten farther than me :D Unfortunately I don't have access to an AWS ARM instance :(

yuvipanda commented 3 years ago

OK, I got it running locally!

Things I had to do:

I think we can do all this independently and get us to aarch64 support

cdibble commented 3 years ago

Nice! Thank you for putting time into this :)

I'm not quite there. I've added your fork+branch of jupyterhub-traefik-proxy to the setup.py for tljh. So something like: install_requires=[..., jupyterhub-traefik-proxy@git+https://github.com/yuvipanda/traefik-proxy.git@optional-deps]. That is installing as expected. And I moved to Ubuntu 20.04/Python3.8.

I can run the bootstrap.py script just fine, but the service fails to start again with the same message- the traefik.service failed to start.

So I modified the traefik.py file to point to the traefik version for linux_arm64 like so:

plat = "linux_arm64"
traefik_version = "2.4.8"

But that isn't working- the published checksum doesn't match what I get with the download. I tried just using the checksum that results from the download, but that does not fix my error with traefik.service. Any ideas? Did you have to modify traefik.py?

UPDATE- The checksums are fine- I wasn't able to download from the url configured in traefik.py. I changed that to traefik_url = ( f"https://github.com/traefik/traefik/releases/download/v{traefik_version}/traefik_v{traefik_version}_{plat}.tar.gz" ) to get the traefik installation routine working. Sadly, that still didn't fix the traefik.service start failure.


     Loaded: loaded (/etc/systemd/system/traefik.service; enabled; vendor preset: enabled)
     Active: failed (Result: exit-code) since Sun 2021-03-28 17:37:51 UTC; 7s ago
    Process: 13532 ExecStart=/opt/tljh/hub/bin/traefik -c /opt/tljh/state/traefik.toml (code=exited, status=203/EXEC)
   Main PID: 13532 (code=exited, status=203/EXEC)

Mar 28 17:37:51 ip-10-13-7-166 systemd[1]: traefik.service: Scheduled restart job, restart counter is at 5.
Mar 28 17:37:51 ip-10-13-7-166 systemd[1]: Stopped traefik.service.
Mar 28 17:37:51 ip-10-13-7-166 systemd[1]: traefik.service: Start request repeated too quickly.
Mar 28 17:37:51 ip-10-13-7-166 systemd[1]: traefik.service: Failed with result 'exit-code'.
Mar 28 17:37:51 ip-10-13-7-166 systemd[1]: Failed to start traefik.service.```
yuvipanda commented 3 years ago

@cdibble I filed https://github.com/jupyterhub/traefik-proxy/issues/128 to work on the traefik proxy installer.

However, that wasn't a problem for me, and I've no idea how :| That it worked makes me suspect the arm-ness of my docker based setup...

cdibble commented 3 years ago

Yes I am surprised you were able to get it to work without an ARM build of traefik. But I haven't been able to get it to work even adding in the url and checksums for the ARM traefik versions in tljh/traefik.py. It does download the appropriate binary and complete the checksums, but then the service still doesn't work. So not sure what's going on.

cdibble commented 3 years ago

Looks like we need traefik v1.7.*.

plat = "linux-arm64"
traefik_version = "1.7.28"

Did the trick. Haven't tested much yet but I will this week.

yuvipanda commented 3 years ago

@cdibble awesome, yay! Please send PRs when you can.

yuvipanda commented 3 years ago

Opened https://github.com/jupyterhub/traefik-proxy/pull/129 to allow for ARM builds in the traefik_proxy installer. I opened https://github.com/jupyterhub/the-littlest-jupyterhub/issues/675 to switch TLJH to using the traefik installer by default so we don't have to repeat that here.

cdibble commented 3 years ago

Ok- sorry for the delay. Busy week.

I've got my fork working properly now and I've tested it with Ubuntu 20.04, python 3.8 on both x86-64 and aarch64 (arm64) servers. Everything seems to work as expected.

I've opened a PR #679 if you want to incorporate these changes. I'd be happy to help resolve any issues. There are also some opportunities for code cleanup (e.g., getting rid of old functions used in the miniconda installation), but I've left those pieces in place.

So, what is different:

  1. Now installing miniforge instead of miniconda. Automatically selects binary based on platform. Has hard-coded checksums for amd64 and arm64 miniforge binaries (for version 4.10.0-0).
  2. Relying on jupyterhub/traefik-proxy#129 for traefik-proxy support, but currently pointing to a dev fork/branch (see below).
  3. Installing traefik version based on platform architecture.

What needs to be updated:

  1. setup.py is pointing to 'jupyterhub-traefik-proxy@git+https://github.com/yuvipanda/traefik-proxy.git@optional-deps' pending the release of the changes in jupyterhub/traefik-proxy#129
  2. In tljh/installer.py the check_miniforge_version routine is not really checking anything meaningful at this point. I just mimicked the checks it was making with check_miniconda_version without having a good reason to check those particular version numbers.
psychemedia commented 3 years ago

FWIW, I started looking at some docker stacks for amd64/arm64/arm32 here crossbuilt in a really inefficient way using Github Actions.

To try to speed things up, I also started building the arm32/arm64 packages on RPis and adding them to my own wheelhouse (I'm not sure piwheels does 32 and 64 bit wheels?)

meeseeksmachine commented 3 years ago

This issue has been mentioned on Jupyter Community Forum. There might be relevant details there:

https://discourse.jupyter.org/t/tljh-on-m1-mac-arm-docker-installer-is-x86-specific/10894/2

consideRatio commented 3 years ago

We've come a long way to support arm64 at this point!

I think #679 can be updated to do very little as a lot of changes are already merged in dedicated PRs.