omnibenchmark / omni-docker

A place to organize Dockerfiles for base images
0 stars 0 forks source link

Create reproducible (renkulab-free) Dockerfiles #1

Open imallona opened 6 months ago

markrobinsonuzh commented 6 months ago

Current status:

REPOSITORY                    TAG                                IMAGE ID       CREATED        SIZE
markrobinsonuzh/omni-docker   slim-py-3.11--pandas               6586c3af26c2   16 hours ago   376MB
markrobinsonuzh/omni-docker   alpine-20231219--py-3.11--pandas   af216daf254e   18 hours ago   737MB
markrobinsonuzh/omni-docker   alpine-20231219--py-3.11           f02dfaa57e09   18 hours ago   338MB
markrobinsonuzh/omni-docker   alpine-20231219--R-4.3.2--bioc     ba5a183664d5   22 hours ago   268MB
markrobinsonuzh/omni-docker   alpine-20231219--R-4.3.2           04bea6df4d9d   13 days ago    251MB
alpine                        20231219                           9198849dd7f6   2 weeks ago    7.38MB
python                        3.11-slim                          dd150e5400f1   4 weeks ago    131MB

.. currently involves a starting point of alpine:20231219 for R tools and python:3.11-slim for Python tools (an alpine:20231219 for Python was tried, but resulted in a much larger container). These are all pushed to docker.io/markrobinsonuzh.

Thoughts and feedback are most welcome, including but not limited to:

imallona commented 6 months ago

thanks! Not an expert but please find some suggestions below

Should we add some tests/logs to run something to verify that the builds produces working containers?

I've tried your baseR and misses libxml and the X are not fully operational

> capabilities()
       jpeg         png        tiff       tcltk         X11        aqua 
       TRUE        TRUE        TRUE        TRUE       FALSE       FALSE 
   http/ftp     sockets      libxml        fifo      cledit       iconv 
       TRUE        TRUE       FALSE        TRUE        TRUE        TRUE 
        NLS       Rprof     profmem       cairo         ICU long.double 
      FALSE        TRUE        TRUE        TRUE        TRUE        TRUE 
    libcurl 
       TRUE 
Warning message:
In doTryCatch(return(expr), name, parentenv, handler) :
  unable to load shared object '/usr/lib/R/modules//R_X11.so':
  Error loading shared library libpangocairo-1.0.so.0: No such file or directory (needed by /usr/lib/R/modules//R_X11.so)

What other specifications should we include (e.g., versioning of libraries?)

I think we should at least list and document blas/lapack/zlib we could either list and document, or pin somehow

Better approach to versioning R software? (right now, just calls to BiocManager::install)

As for R packages, we either install them defining the version, or we don't but list and document the installed versions (and perhaps also store the tarballs to a local storage as a backup plan to facilitate reproducible builds, i.e. if files disappear from CRAN etc). It might be unconventional, but wgeting the versioned packages tarballs and R CMD INSTALLing them sounds appealing to me, it's not pretty but KISS.

Are there other places in the container to save space?

They look pretty compact to me. I'm double checking the python installs (I think it's running workarounds for issues fixed long ago, like the C locale, or pip versioning schemas), and the R compilation/capabilities.

What spectrum of "base" containers should we provide to users (R/BioC combinations; Python versions)?

Those sound good. I'd release python versions to the x.y.z version, i.e. 3.12.1 instead of 3.12.

Other thoughts

markrobinsonuzh commented 5 months ago

Spent a bit of time on the X11 capability, but it doesn't seem to want to fire, despite being seemingly ready for it in the R compile:

(base) mark@MLS-M-MARO alpine-R-base % grep "[Xx]11" build.log 
[snip]
#5 1.552 (31/194) Installing libx11 (1.8.7-r0)
#5 2.336 (50/194) Installing libx11-dev (1.8.7-r0)
#5 31.04 checking for X11/Intrinsic.h... yes
#5 31.13 using X11 ... yes
#5 31.19 checking for X11/Xmu/Atoms.h... yes
#5 35.57 config.status: creating src/modules/X11/Makefile
#5 35.94   Interfaces supported:        X11, tcltk
#5 44.32 making X11.d from X11.c
#5 44.36 gcc -I. -I../../src/include -I../../src/include  -I/usr/local/include -DHAVE_CONFIG_H   -fopenmp -fpic  -g -O2 -fstack-protector-strong -D_DEFAULT_SOURCE -D__USE_MISC  -c X11.c -o X11.o
#5 45.01 ar -cr libunix.a Rembedded.o dynload.o system.o sys-unix.o sys-std.o X11.o
#5 75.52 make[3]: Entering directory '/tmp/R-4.3.2/src/modules/X11'
#5 75.52 making devX11.d from devX11.c
#5 75.67 make[4]: Entering directory '/tmp/R-4.3.2/src/modules/X11'
#5 75.67 gcc -I/usr/include/libpng16 -I/usr/include/webp -I. -I../../../src/include -I../../../src/include  -I/usr/include/pango-1.0 -I/usr/include/harfbuzz -I/usr/include/glib-2.0 -I/usr/lib/glib-2.0/include -I/usr/include/cairo -I/usr/include/freetype2 -I/usr/include/libpng16 -pthread -I/usr/include/libmount -I/usr/include/blkid -I/usr/include/fribidi -I/usr/include/pixman-1 -I../../../src/library/grDevices/src/cairo -I/usr/local/include -DHAVE_CONFIG_H   -fopenmp -fpic  -g -O2 -fstack-protector-strong -D_DEFAULT_SOURCE -D__USE_MISC  -c devX11.c -o devX11.o
#5 78.32 gcc -shared -L"../../../lib" -L/usr/local/lib -o R_de.so dataentry.o -lSM -lICE -lpangocairo-1.0 -lpango-1.0 -lgobject-2.0 -lglib-2.0 -lintl -lharfbuzz -L/lib -lz -lpng16 -lcairo -lX11 -lXext -lX11 -lXt -lXmu  -lR -lm 
#5 78.96 gcc -shared -L"../../../lib" -L/usr/local/lib -o R_X11.so devX11.o rotated.o rbitmap.o -ltiff -ljpeg -lpng16 -lSM -lICE -lpangocairo-1.0 -lpango-1.0 -lgobject-2.0 -lglib-2.0 -lintl -lharfbuzz -L/lib -lz -lpng16 -lcairo -lX11 -lXext -lX11 -lXt -lXmu  -lR -lm 
#5 79.00 make[4]: Leaving directory '/tmp/R-4.3.2/src/modules/X11'
#5 79.01 make[4]: Entering directory '/tmp/R-4.3.2/src/modules/X11'
#5 79.02 make[4]: Leaving directory '/tmp/R-4.3.2/src/modules/X11'
#5 79.02 make[3]: Leaving directory '/tmp/R-4.3.2/src/modules/X11'
#5 204.6 gcc -I. -I../../../../../src/include -I../../../../../src/include -I/usr/include/pango-1.0 -I/usr/include/harfbuzz -I/usr/include/glib-2.0 -I/usr/lib/glib-2.0/include -I/usr/include/cairo -I/usr/include/freetype2 -I/usr/include/libpng16 -pthread -I/usr/include/libmount -I/usr/include/blkid -I/usr/include/fribidi -I/usr/include/pixman-1 -I../../../../../src/modules/X11 -I/usr/local/include -DHAVE_CONFIG_H   -fopenmp -fpic  -g -O2 -fstack-protector-strong -D_DEFAULT_SOURCE -D__USE_MISC  -c cairoBM.c -o cairoBM.o
#5 205.9 gcc -shared -L"../../../../../lib" -L/usr/local/lib -o cairo.so cairoBM.o ../../../../../src/modules/X11/rbitmap.o -ltiff -ljpeg -lpng16 -lpangocairo-1.0 -lpango-1.0 -lgobject-2.0 -lglib-2.0 -lintl -lharfbuzz -L/lib -lz -lcairo -lpng16 -L"../../../../../lib" -lR -lm 
#5 280.4 gcc -shared -L../../../../lib -L/usr/local/lib -o tcltk.so init.o tcltk.o tcltk_unix.o -L/usr/lib -ltcl8.6 -L/usr/lib -ltk8.6 -lX11 -lm -L../../../../lib -lR
#5 443.5 make[3]: Entering directory '/tmp/R-4.3.2/src/modules/X11'
#5 443.5 make[3]: Leaving directory '/tmp/R-4.3.2/src/modules/X11'
#5 450.8        jpeg         png        tiff       tcltk         X11        aqua 

There is now no error/warning about the shared object (I write this in the build log now):

#5 450.8 > capabilities()
#5 450.8        jpeg         png        tiff       tcltk         X11        aqua 
#5 450.8        TRUE        TRUE        TRUE        TRUE       FALSE       FALSE 
#5 450.8    http/ftp     sockets      libxml        fifo      cledit       iconv 
#5 450.8        TRUE        TRUE       FALSE        TRUE       FALSE        TRUE 
#5 450.8         NLS       Rprof     profmem       cairo         ICU long.double 
#5 450.8       FALSE        TRUE        TRUE        TRUE        TRUE        TRUE 
#5 450.8     libcurl 
#5 450.8        TRUE 

Also, libxml is installed, but not there in R:

(base) mark@MLS-M-MARO alpine-R-base % grep "xml" build.log
[snip]
#5 2.713 (66/194) Installing libxml2 (2.12.3-r0)
#5 2.767 (67/194) Installing libxml2-utils (2.12.3-r0)
#5 2.784 (68/194) Installing docbook-xml (4.5-r8)
#5 2.817 Executing docbook-xml-4.5-r8.post-install
#5 450.8    http/ftp     sockets      libxml        fifo      cledit       iconv 

.. but I think this one is ok because in the docs it says:

  libxml: is there support for integrating ‘libxml’ with the R event
          loop?  ‘TRUE’ as from R 3.3.0, ‘FALSE’ as from R 4.2.0.

Note that the changes now make the image >100MB bigger:

(base) mark@MLS-M-MARO ~ % docker image ls
REPOSITORY                    TAG                                IMAGE ID       CREATED        SIZE
markrobinsonuzh/omni-docker   alpine-20231219--R-4.3.2           aa6668a7d9e7   13 hours ago   369MB
markrobinsonuzh/omni-docker   slim-py-3.11--pandas               6586c3af26c2   6 days ago     376MB
markrobinsonuzh/omni-docker   alpine-20231219--py-3.11--pandas   af216daf254e   6 days ago     737MB
markrobinsonuzh/omni-docker   alpine-20231219--py-3.11           f02dfaa57e09   6 days ago     338MB
markrobinsonuzh/omni-docker   alpine-20231219--R-4.3.2--bioc     ba5a183664d5   6 days ago     268MB
alpine                        20231219                           9198849dd7f6   3 weeks ago    7.38MB
python                        3.11-slim                          dd150e5400f1   5 weeks ago    131MB

.. and I don't know if it is worth it (earlier, some of the build dependencies were removed after compiling R). Because these containers are mostly for running software, not interactive sessions.

markrobinsonuzh commented 5 months ago

As for R packages, we either install them defining the version, or we don't but list and document the installed versions (and perhaps also store the tarballs to a local storage as a backup plan to facilitate reproducible builds, i.e. if files disappear from CRAN etc). It might be unconventional, but wgeting the versioned packages tarballs and R CMD INSTALLing them sounds appealing to me, it's not pretty but KISS.

I'm not sure how this would work, generally. Some packages are CRAN, some are BioC (some github!). But it doesn't seem easy to track their origin, other than in a brut strength way of querying the mirrors (maybe that's what you mean). And it would need to be done after-the-fact, because running a BiocManager::install("foo") would retrieve foo, but also an arbitrary number of other packages, either from CRAN or BioC.

markrobinsonuzh commented 5 months ago

Current status is this:

(base) mark@MLS-M-MARO alpine-R-base-bioc % docker image ls                
REPOSITORY                    TAG                              IMAGE ID       CREATED          SIZE
markrobinsonuzh/omni-docker   alpine-20231219--R-4.3.2--bioc   fa1650c18027   46 seconds ago   940MB
markrobinsonuzh/omni-docker   alpine-20231219--R-4.3.2         ef6629b7a09f   23 minutes ago   774MB
markrobinsonuzh/omni-docker   slim-py-3.11.7--pandas           6ca929b5a102   26 minutes ago   1.21GB
markrobinsonuzh/omni-docker   slim-py-3.11.7                   32feb879fcd4   41 minutes ago   908MB
alpine                        20231219                         9198849dd7f6   3 weeks ago      7.38MB
python                        3.11-slim                        dd150e5400f1   5 weeks ago      131MB

.. so, containers are getting a little bit big now because omnibenchmark and renku have many dependencies. We can still look if there are places we can shave there.

Otherwise, the following have now been addressed:

Things still to do, I think:

Would be happy to discuss.