haskell / docker-haskell

MIT License
63 stars 36 forks source link

Switch to ghcup as the installation method #44

Closed AlistairB closed 2 years ago

AlistairB commented 2 years ago

Closes #45

40

43

The current installation method is dependent on debian ghc and cabal packaging, which often is slow to be updated. Additionally, ghcup is becoming a standard installation technique of ghc ie. in the github actions, so we probably don't want to reinvent the wheel here.

I believe it also provides an easier path to support windows and arm based images.

Multi stage file?

I believe it is best that we don't include ghcup in the final image (this point will need to be discussed further). In which case a multi stage file is a nice way to ensure the final image doesn't include any cache used as part of the installation or packages that are only required by ghcup, but not ghc / cabal / stack users.

Size difference

Old 9.0.1 - 1.5GB New 9.0.1 - 1.67GB

There may be more that can be stripped out that ghcup is installing, but the debian packages are not. The only other difference I note is the .so files are a bit bigger from the ghcup version ie.base-4.14.1.0/libHSbase-4.14.1.0-ghc8.10.4.so

ghcup - 12.8MB debian package - 10.2MB

Still, overall the difference is small enough I'm inclined to ignore it.

Boot library haddocks

Related to size, I am stripping out /opt/ghc/share entirely from the images to save 286MB. The debian ghc package does not entirely remove this, but leaves in the Haddock files for the boot libraries. ie. /opt/ghc/8.10.4/share/doc/ghc-8.10.4/html/libraries/text-1.2.4.1/text.haddock. I don't think these are that useful? Or I could leave them in, it would just be a bit more wrangling.

Other changes

I have also taken this as an opportunity to revise the list of packages we include in the final image. The list is:

Removed

I'm not sure if these are old stack or other dependencies or if they are just needed to build the current images.

hasufell commented 2 years ago

https://github.com/AlistairB/docker-haskell/pull/1/files

psftw commented 2 years ago

Sorry for delayed response! I'm all on board with this and it's exciting to see.

We do not want to use multistage builds for technocratic reasons: https://github.com/docker-library/faq#multi-stage-builds

We definitely want to keep openssh-client which supports downloading dependencies from private git repositories, as well as libsqlite3-dev since it is a common "batteries-included" dependency. dirmngr is a little more nuance as I think we can drop it, but that depends on the calls to gnupg we need to add back:

For the initial download of ghcup, we need to follow official image guidelines and validate the GPG signatures that are provided by upstream w/ releases (i.e. here).

My understanding (which could be totally wrong!) is that ghcup will download a metadata file at runtime which contains binary locations and corresponding sha256 to validate. Since the Official Image project folks can rebuild us at any time, we now depend on this to work reliably given specific versions we are trying to support (currently ~"2 most recent GHC branches x 2 most recent Debian releases").

Putting on my contrarian hat for a moment -- why not just do what ghcup does directly? We could technically retain better security/transparency by checking GPG signatures at the expense of more effort to maintain the Dockerfile.

I will want to look closer at the image contents between install methods to build more confidence in what these changes are doing, but overall I think this is a good direction to go.

RE: including ghcup in the final image -- I don't think we should include it, but I go back and forth on this one. There is precedent for including it w/ rustup, but then that tool does so much more post-install that you may want.

hasufell commented 2 years ago

We do not want to use multistage builds for technocratic reasons: https://github.com/docker-library/faq#multi-stage-builds

I read the comment and can't see what's wrong with multi-stage builds. If you clean up your cache, then yes, you might lose those. That's expected?

AlistairB commented 2 years ago

I read the comment and can't see what's wrong with multi-stage builds. If you clean up your cache, then yes, you might lose those. That's expected?

It's a bit confusing I agree. The way I am reading the FAQ is that the docker-library build process cannot currently cache intermediate stages of multi-stage builds. So if you modify only the final image, it still has to rebuild the previous stages which is not the case when just using docker build.

I believe we match the case 2 in the FAQ so sounds like it might be acceptable?

I think the motivation for a multi-stage build here is only if we don't want to provide ghcup in the final image. It is a convenient way to not include it, dependent packages and any cache used as part of the ghc / cabal / stack installation process.

We definitely want to keep openssh-client which supports downloading dependencies from private git repositories, as well as libsqlite3-dev since it is a common "batteries-included" dependency.

No worries, I will add them back.

For the initial download of ghcup, we need to follow official image guidelines and validate the GPG signatures that are provided by upstream w/ releases (i.e. here).

Good point. I will add this.

Putting on my contrarian hat for a moment -- why not just do what ghcup does directly? We could technically retain better security/transparency by checking GPG signatures at the expense of more effort to maintain the Dockerfile.

This is a good question. I don't know enough about GPG and I'm going to read up about it. But can ghcup do GPG verification in a similar way? If we can make the same improvement in ghcup, that has a broader impact and we get the same benefit. I also don't know enough about what else ghcup does as part of installation of ghc that we might need.

At least in the windows case (+ perhaps ARM?) I think ghcup would be doing a lot more wrangling for us, so for that case it is probably the way to go.

Having said that, if we can relatively easily install ghc + cabal from the released artefacts I think that may be the way to go. It would be easy to avoid a multi stage build and keep the layers clean. I'll do some more investigation + thinking about this.

AlistairB commented 2 years ago

Re gpg keys:

I've been trying to get gpg verification with ghcup and failing. ghcup includes SHA256SUMS and SHA256SUMS.sig produced as documented.

Trying something like the following doesn't quite work.

# tried with these 3 match on the email https://keyserver.ubuntu.com/pks/lookup?search=hasufell%40posteo.de&fingerprint=on&op=index
$ gpg --keyserver keyserver.ubuntu.com --recv-keys 511B62C09D50CD28
$ gpg --batch --trusted-key 511B62C09D50CD28 --verify SHA256SUMS.sig SHA256SUMS
gpg: Signature made Thu 12 Aug 2021 03:34:23 AEST
gpg:                using RSA key 7784930957807690A66EBDBE3786C5262ECB4A3F
gpg:                issuer "hasufell@posteo.de"
gpg: Can't check signature: No public key

I think I must be doing something wrong. I am mostly just copying what we do for stack gpg verification but perhaps because this is .sig it is different. (I similarly fail for ghc / cabal verification, but I'm not certain which key to use for that.)

hasufell commented 2 years ago

It's a bit confusing I agree. The way I am reading the FAQ is that the docker-library build process cannot currently cache intermediate stages of multi-stage builds. So if you modify only the final image, it still has to rebuild the previous stages which is not the case when just using docker build.

I don't think that's true at all. I've just tried it and caching works fine.