Open imallona opened 6 months ago
thanks! Not an expert but please find some suggestions below
Should we add some tests/logs to run something to verify that the builds produces working containers?
I've tried your baseR and misses libxml and the X are not fully operational
> capabilities()
jpeg png tiff tcltk X11 aqua
TRUE TRUE TRUE TRUE FALSE FALSE
http/ftp sockets libxml fifo cledit iconv
TRUE TRUE FALSE TRUE TRUE TRUE
NLS Rprof profmem cairo ICU long.double
FALSE TRUE TRUE TRUE TRUE TRUE
libcurl
TRUE
Warning message:
In doTryCatch(return(expr), name, parentenv, handler) :
unable to load shared object '/usr/lib/R/modules//R_X11.so':
Error loading shared library libpangocairo-1.0.so.0: No such file or directory (needed by /usr/lib/R/modules//R_X11.so)
What other specifications should we include (e.g., versioning of libraries?)
I think we should at least list and document blas/lapack/zlib we could either list and document, or pin somehow
Better approach to versioning R software? (right now, just calls to BiocManager::install)
As for R packages, we either install them defining the version, or we don't but list and document the installed versions (and perhaps also store the tarballs to a local storage as a backup plan to facilitate reproducible builds, i.e. if files disappear from CRAN etc). It might be unconventional, but wget
ing the versioned packages tarballs and R CMD INSTALL
ing them sounds appealing to me, it's not pretty but KISS.
Are there other places in the container to save space?
They look pretty compact to me. I'm double checking the python installs (I think it's running workarounds for issues fixed long ago, like the C locale, or pip versioning schemas), and the R compilation/capabilities.
What spectrum of "base" containers should we provide to users (R/BioC combinations; Python versions)?
Those sound good. I'd release python versions to the x.y.z version, i.e. 3.12.1 instead of 3.12.
Other thoughts
Spent a bit of time on the X11 capability, but it doesn't seem to want to fire, despite being seemingly ready for it in the R compile:
(base) mark@MLS-M-MARO alpine-R-base % grep "[Xx]11" build.log
[snip]
#5 1.552 (31/194) Installing libx11 (1.8.7-r0)
#5 2.336 (50/194) Installing libx11-dev (1.8.7-r0)
#5 31.04 checking for X11/Intrinsic.h... yes
#5 31.13 using X11 ... yes
#5 31.19 checking for X11/Xmu/Atoms.h... yes
#5 35.57 config.status: creating src/modules/X11/Makefile
#5 35.94 Interfaces supported: X11, tcltk
#5 44.32 making X11.d from X11.c
#5 44.36 gcc -I. -I../../src/include -I../../src/include -I/usr/local/include -DHAVE_CONFIG_H -fopenmp -fpic -g -O2 -fstack-protector-strong -D_DEFAULT_SOURCE -D__USE_MISC -c X11.c -o X11.o
#5 45.01 ar -cr libunix.a Rembedded.o dynload.o system.o sys-unix.o sys-std.o X11.o
#5 75.52 make[3]: Entering directory '/tmp/R-4.3.2/src/modules/X11'
#5 75.52 making devX11.d from devX11.c
#5 75.67 make[4]: Entering directory '/tmp/R-4.3.2/src/modules/X11'
#5 75.67 gcc -I/usr/include/libpng16 -I/usr/include/webp -I. -I../../../src/include -I../../../src/include -I/usr/include/pango-1.0 -I/usr/include/harfbuzz -I/usr/include/glib-2.0 -I/usr/lib/glib-2.0/include -I/usr/include/cairo -I/usr/include/freetype2 -I/usr/include/libpng16 -pthread -I/usr/include/libmount -I/usr/include/blkid -I/usr/include/fribidi -I/usr/include/pixman-1 -I../../../src/library/grDevices/src/cairo -I/usr/local/include -DHAVE_CONFIG_H -fopenmp -fpic -g -O2 -fstack-protector-strong -D_DEFAULT_SOURCE -D__USE_MISC -c devX11.c -o devX11.o
#5 78.32 gcc -shared -L"../../../lib" -L/usr/local/lib -o R_de.so dataentry.o -lSM -lICE -lpangocairo-1.0 -lpango-1.0 -lgobject-2.0 -lglib-2.0 -lintl -lharfbuzz -L/lib -lz -lpng16 -lcairo -lX11 -lXext -lX11 -lXt -lXmu -lR -lm
#5 78.96 gcc -shared -L"../../../lib" -L/usr/local/lib -o R_X11.so devX11.o rotated.o rbitmap.o -ltiff -ljpeg -lpng16 -lSM -lICE -lpangocairo-1.0 -lpango-1.0 -lgobject-2.0 -lglib-2.0 -lintl -lharfbuzz -L/lib -lz -lpng16 -lcairo -lX11 -lXext -lX11 -lXt -lXmu -lR -lm
#5 79.00 make[4]: Leaving directory '/tmp/R-4.3.2/src/modules/X11'
#5 79.01 make[4]: Entering directory '/tmp/R-4.3.2/src/modules/X11'
#5 79.02 make[4]: Leaving directory '/tmp/R-4.3.2/src/modules/X11'
#5 79.02 make[3]: Leaving directory '/tmp/R-4.3.2/src/modules/X11'
#5 204.6 gcc -I. -I../../../../../src/include -I../../../../../src/include -I/usr/include/pango-1.0 -I/usr/include/harfbuzz -I/usr/include/glib-2.0 -I/usr/lib/glib-2.0/include -I/usr/include/cairo -I/usr/include/freetype2 -I/usr/include/libpng16 -pthread -I/usr/include/libmount -I/usr/include/blkid -I/usr/include/fribidi -I/usr/include/pixman-1 -I../../../../../src/modules/X11 -I/usr/local/include -DHAVE_CONFIG_H -fopenmp -fpic -g -O2 -fstack-protector-strong -D_DEFAULT_SOURCE -D__USE_MISC -c cairoBM.c -o cairoBM.o
#5 205.9 gcc -shared -L"../../../../../lib" -L/usr/local/lib -o cairo.so cairoBM.o ../../../../../src/modules/X11/rbitmap.o -ltiff -ljpeg -lpng16 -lpangocairo-1.0 -lpango-1.0 -lgobject-2.0 -lglib-2.0 -lintl -lharfbuzz -L/lib -lz -lcairo -lpng16 -L"../../../../../lib" -lR -lm
#5 280.4 gcc -shared -L../../../../lib -L/usr/local/lib -o tcltk.so init.o tcltk.o tcltk_unix.o -L/usr/lib -ltcl8.6 -L/usr/lib -ltk8.6 -lX11 -lm -L../../../../lib -lR
#5 443.5 make[3]: Entering directory '/tmp/R-4.3.2/src/modules/X11'
#5 443.5 make[3]: Leaving directory '/tmp/R-4.3.2/src/modules/X11'
#5 450.8 jpeg png tiff tcltk X11 aqua
There is now no error/warning about the shared object (I write this in the build log now):
#5 450.8 > capabilities()
#5 450.8 jpeg png tiff tcltk X11 aqua
#5 450.8 TRUE TRUE TRUE TRUE FALSE FALSE
#5 450.8 http/ftp sockets libxml fifo cledit iconv
#5 450.8 TRUE TRUE FALSE TRUE FALSE TRUE
#5 450.8 NLS Rprof profmem cairo ICU long.double
#5 450.8 FALSE TRUE TRUE TRUE TRUE TRUE
#5 450.8 libcurl
#5 450.8 TRUE
Also, libxml is installed, but not there in R:
(base) mark@MLS-M-MARO alpine-R-base % grep "xml" build.log
[snip]
#5 2.713 (66/194) Installing libxml2 (2.12.3-r0)
#5 2.767 (67/194) Installing libxml2-utils (2.12.3-r0)
#5 2.784 (68/194) Installing docbook-xml (4.5-r8)
#5 2.817 Executing docbook-xml-4.5-r8.post-install
#5 450.8 http/ftp sockets libxml fifo cledit iconv
.. but I think this one is ok because in the docs it says:
libxml: is there support for integrating ‘libxml’ with the R event
loop? ‘TRUE’ as from R 3.3.0, ‘FALSE’ as from R 4.2.0.
Note that the changes now make the image >100MB bigger:
(base) mark@MLS-M-MARO ~ % docker image ls
REPOSITORY TAG IMAGE ID CREATED SIZE
markrobinsonuzh/omni-docker alpine-20231219--R-4.3.2 aa6668a7d9e7 13 hours ago 369MB
markrobinsonuzh/omni-docker slim-py-3.11--pandas 6586c3af26c2 6 days ago 376MB
markrobinsonuzh/omni-docker alpine-20231219--py-3.11--pandas af216daf254e 6 days ago 737MB
markrobinsonuzh/omni-docker alpine-20231219--py-3.11 f02dfaa57e09 6 days ago 338MB
markrobinsonuzh/omni-docker alpine-20231219--R-4.3.2--bioc ba5a183664d5 6 days ago 268MB
alpine 20231219 9198849dd7f6 3 weeks ago 7.38MB
python 3.11-slim dd150e5400f1 5 weeks ago 131MB
.. and I don't know if it is worth it (earlier, some of the build dependencies were removed after compiling R). Because these containers are mostly for running software, not interactive sessions.
As for R packages, we either install them defining the version, or we don't but list and document the installed versions (and perhaps also store the tarballs to a local storage as a backup plan to facilitate reproducible builds, i.e. if files disappear from CRAN etc). It might be unconventional, but wgeting the versioned packages tarballs and R CMD INSTALLing them sounds appealing to me, it's not pretty but KISS.
I'm not sure how this would work, generally. Some packages are CRAN, some are BioC (some github!). But it doesn't seem easy to track their origin, other than in a brut strength way of querying the mirrors (maybe that's what you mean). And it would need to be done after-the-fact, because running a BiocManager::install("foo")
would retrieve foo
, but also an arbitrary number of other packages, either from CRAN or BioC.
Current status is this:
(base) mark@MLS-M-MARO alpine-R-base-bioc % docker image ls
REPOSITORY TAG IMAGE ID CREATED SIZE
markrobinsonuzh/omni-docker alpine-20231219--R-4.3.2--bioc fa1650c18027 46 seconds ago 940MB
markrobinsonuzh/omni-docker alpine-20231219--R-4.3.2 ef6629b7a09f 23 minutes ago 774MB
markrobinsonuzh/omni-docker slim-py-3.11.7--pandas 6ca929b5a102 26 minutes ago 1.21GB
markrobinsonuzh/omni-docker slim-py-3.11.7 32feb879fcd4 41 minutes ago 908MB
alpine 20231219 9198849dd7f6 3 weeks ago 7.38MB
python 3.11-slim dd150e5400f1 5 weeks ago 131MB
.. so, containers are getting a little bit big now because omnibenchmark
and renku
have many dependencies. We can still look if there are places we can shave there.
Otherwise, the following have now been addressed:
git
and git-annex
installedrenku
is installed on all imagesomb
Things still to do, I think:
Would be happy to discuss.
Current status:
.. currently involves a starting point of
alpine:20231219
for R tools andpython:3.11-slim
for Python tools (analpine:20231219
for Python was tried, but resulted in a much larger container). These are all pushed to docker.io/markrobinsonuzh.Thoughts and feedback are most welcome, including but not limited to:
BiocManager::install
)