install texlive conditionally

cboettig commented 5 months ago

@eitsupi my apologies, I didn't realize my local test was pulling the upstream image. The tex issue is specific to the changes in #743 -- specifically, the official ubuntu repo texlive binaries are installed and purged in the BUILDDEPS list. By not purging builddeps, we had conflicting versions of texlive.

This PR modifies the install_texlive.sh to not do the manual install of tlmgr from CPAN if the texlive has already be installed by apt-get (i.e. /usr/bin/latex already exists). I believe it should resolve the issue.

eitsupi commented 5 months ago

See https://github.com/rocker-org/rocker-versioned2/pull/743#discussion_r1464155435

I apologize for the poor quality of the review. I think the last build didn't remove the build dependencies, which increased the image size of rocker/r-ver by more than 2GB. It needs to be fixed immediately.

https://github.com/rocker-org/rocker-versioned2/wiki/r-ver_4c21e4362fbf v.s. https://github.com/rocker-org/rocker-versioned2/wiki/r-ver_05eb2a30305e

eitsupi commented 5 months ago

See #751

eitsupi commented 5 months ago

It's really unfortunate that the regression test was working correctly and we ended up merging.

eitsupi commented 5 months ago

I didn't realize my local test was pulling the upstream image.

I don't think the current structure is really good. Ideally, we should only rely on all images ubuntu and not on the rocker image that has been pushed. But I don't have the energy to do it because it takes a lot of effort to change the structure.

cboettig commented 5 months ago

I think the failing test will succeed once https://github.com/rocker-org/rocker-versioned2/actions/runs/7634476675/job/20798413557 is finished. I'll leave this as a draft until then.

The current test structure seems to depend on testing against the current published version of the rocker/r-ver image, not the modified one, which is what confused me? In https://github.com/rocker-org/rocker-versioned2/actions/runs/7634519606/job/20799340714?pr=750#step:5:31, it appears to be pulling rocker/r-ver from docker hub, not using the one in the PR.

To test, I'm using a dummy Dockerfile that basically uses your proposal in #754 and adds the extra script tests:

FROM ubuntu:jammy

LABEL org.opencontainers.image.licenses="GPL-2.0-or-later" \
      org.opencontainers.image.source="https://github.com/rocker-org/rocker-versioned2" \
      org.opencontainers.image.vendor="Rocker Project" \
      org.opencontainers.image.authors="Carl Boettiger <cboettig@ropensci.org>"

ENV R_VERSION=4.3.2
ENV R_HOME=/usr/local/lib/R
ENV TZ=Etc/UTC

COPY scripts/install_R_source.sh /rocker_scripts/install_R_source.sh

RUN /rocker_scripts/install_R_source.sh

ENV CRAN=https://p3m.dev/cran/__linux__/jammy/latest
ENV LANG=en_US.UTF-8

COPY scripts /rocker_scripts

RUN /rocker_scripts/setup_R.sh

CMD ["R"]

# TESTING additional steps after r-ver steps:
RUN /rocker_scripts/install_jupyter.sh
RUN /rocker_scripts/install_verse.sh

Anyway, let's see if the above tests pass once the hotfixes you so nicely did reach DockerHub.

eitsupi commented 5 months ago

@cboettig No, the Dockerfile used in the tests uses the latest scripts.

https://github.com/rocker-org/rocker-versioned2/blob/1828862223cd0a5357f2523c0dc85777c64fb9b7/tests/rocker_scripts/Dockerfile#L5-L10

The test case for not removing build deps is not covered here, so if CI goes green here, the rocker/ml-verse build will continue to fail.

eitsupi commented 5 months ago

You should add the test case install_texlive.sh on rocker/cuda:latest to this file.

https://github.com/rocker-org/rocker-versioned2/blob/1828862223cd0a5357f2523c0dc85777c64fb9b7/tests/rocker_scripts/matrix.json

cboettig commented 5 months ago

@eitsupi thanks for all your help here, sorry I don't always understand things.

Yes, I see the test case copies the scripts, but it looks to me that is based on pulling r-ver from Dockerhub, and is not running install_R_source.sh script again. Thus it appears to me the test case is grabbing the version of r-ver on Dockerhub that does not have builddeps purged. We see here in the failing test, it pull from hub, copy the new scripts, and run only the install_verse.sh, right?: https://github.com/rocker-org/rocker-versioned2/actions/runs/7634519606/job/20799340714?pr=750#step:5:77

I agree we need additional tests as well. I think we actually have to test both install_texlive.sh and install_jupyter.sh in the cuda scenario (I think it is already covered in the r-ver case). I'll try to add that test now.

eitsupi commented 5 months ago

and is not running install_R_source.sh script again. Thus it appears to me the test case is grabbing the version of r-ver on Dockerhub that does not have builddeps purged. We see here in the failing test, it pull from hub, copy the new scripts, and run only the install_verse.sh, right?

Correct. As you know, install_R_source.sh is quite time-consuming to execute, so regression tests are rarely triggered. The test is here, but it is not triggered in this PR.

https://github.com/rocker-org/rocker-versioned2/blob/1828862223cd0a5357f2523c0dc85777c64fb9b7/.github/workflows/r-build-test.yml#L56-L91

cboettig commented 5 months ago

looks like the above test failures are network issues (HTTP 500 errors) on the manual install for texlive.

Given how temperamental the network is on https://mirror.ctan.org/ and how large the cuda images are to begin with, I'm not sad to be getting texlive from the much more reliable ubuntu repos. The changes in this PR should make the relevant scripts all more robust to working either with the hand-selected packages via tlmgr or the more standard (but larger) use of the official ubuntu apt binaries.

For the reason we discussed above, these tests are still not 100% correct when running on the CI, because they are not running on the r-ver and cuda images that would be built here.

I think a more complete check that I'm using locally is the above Dockerfile that runs these scripts in top of a r-ver rebuilt from scratch, and the same thing but for the cuda side. I've tested both of those builds locally.

We can tickle those checks again on the CI as well, but as a hotfix to correct the previous broken builds I think this will now succeed, modulo the stability of CTAN.org servers....

cboettig commented 5 months ago

@eitsupi ok tests are green, including I think the additional test you mentioned. I think this is ready for review, thanks for all the help!

cboettig commented 5 months ago

@eitsupi anything more on this? don't mean to rush you but do want to finish off this hot-fix for the broken builds created by #743 , also current workflows won't run on some of the published images until this is merged.

eitsupi commented 5 months ago

Thanks, but is this a hotfix? The broken images have already been fixed, and I don't think we should rush the build after merging this (it is a waste of resources to build an almost identical one with only one day open)

cboettig commented 5 months ago

I have had several workflows that are broken until this is fixed. I am glad that we are respectful of the resources that Microsoft is providing for free, but I am also spending quite a bit of time working around the broken images at the moment and this probably impacts other users as well.

eddelbuettel commented 5 months ago

respectful of the resources that Microsoft is providing for free

It's complicated but I think "the community as a whole" is also overdoing it on builds, and CI runs, and whatnot. (This comment is not about Rocker, or a stack, but more generic. E.g. I have a beef with "kids these days" being lead to cookie-cutter development that runs each commit over a matrix of five or more builds for really no apparent reason. My hill to to die on I guess.) Then again we are rounding error compared to, say, bitcoin mining and LLM training.

cboettig commented 5 months ago

@eddelbuettel yeah I totally hear you that. I think your point is that CI should be used thoughtfully, not mindlessly, otherwise developers and contributors time is not well spent. I don't think anyone is worried that one of the 'magnificent 7' most profitable companies needs our help to ensure it remains financially viable. There is of course a real carbon footprint to mindless CI (and needless use of chatbots etc), despite the aggressive carbon targets...

but I digress, I'm just hoping to get tensorflow working smoothly and other such bits of the python+gpu experience on our image during this brief moment when I have a few cycles to invest on this before other things take over all my time again.

eddelbuettel commented 5 months ago

As an (index fund owning hence indirect) MSFT shareholder I am not too worried about their profitability. Free resources tend to be a cost of user acquisition that gets amortised in the long run over eventual fees.

It's the carbon footprint from endlessly running jobs just because we can...

But yep thanks for making it easy to deploy gpus. I appreciate it too per my recent bug report.

rocker-org / rocker-versioned2

install texlive conditionally #750