Closed Toub closed 5 years ago
I don't know much about docker, but if you submit one, I'll take a look and think about it. (Whether I want to take on the maintenance burden depends on how often it is likely to need to change.)
Great!
There is a discussion with @jagregory in order to move its Dockerfile here.
Then we can help you to maintain the Dockerfile if you are not familiar with docker.
Another possible place to make this semi-official is pandoc-extras: unofficial org for managing Pandoc extras. I already created pandoc-portable and pandoc-nightly there. In particular I think a docker image would actually fit into pandoc-portable (since both of them are great for CI).
I think it might be better not to put it in the jgm/pandoc's releases, because "normal" users do not quite use docker. And then it will make a new pandoc release less burdensome for @jgm.
(continue from last post) On the other hand, an official docker file might also serves as a summary of the dependencies needed in pandoc. For example, currently dependencies on LaTeX packages are mentioned in the Manual, but it would be nice that one can look into the docker file and see a list of packages (and commands) required.
By the way, I glanced the 2 docker files above, and they seem to uses the LaTeX packages from apt-get. A better approach might be to uses tlmgr directly (most up-to-date, and smaller since you control exactly which packages pandoc needs and installed by tlmgr but not more).
But such a minimal docker file might resulted in some users complaining some of the package they need (e.g. as simple as a different font) is not there.
On the other hand, an official docker file might also serves as a summary of the dependencies needed in pandoc.
@ickc that's a good argument, as well as offering another distribution channel for end users. I don't know if LaTeX is much required and an additional Dockerfile, expanding the official one could totally be made available in pandoc-extras to offer more features, opt-in.
So, were you going to post a docker container here for us to start playing with?
I tried the one linked above; it seemed very basic and gave an error with a conversion to PDF (didn't have all the needed latex packages, I think).
What about starting with this one, as it is almost ready (just waiting for last version PR to be merged): https://github.com/jagregory/pandoc-docker.
Then, when someone has some time, it can be improved and split in 2 dockerfiles (minimal + extended)?
Until @jagregory merge the last PR, you can try this fork: https://github.com/toubiweb/pandoc-docker
It includes pandoc 1.19.2.1 as well as lmodern library.
@jgm, if the conversion still fails, please post here the error.
My repo is now updated, and contains a compatible license. If you want to use it as a basis for an official image, you're more than welcome to. Please let me know if I should deprecate my image.
Just to revisit this issue: probably this should not be included in the pandoc repo. The reason is that there are many different ways of distributing a package, from distro-specific ones, to conda, AppImage from another thread, and Docker here, etc. What the pandoc repo should focus on is to cover the simplest case where an end user can install the latest version of pandoc, either through the installer or portable archive. E.g. I heard that some people also maintain a recipe for a package manager to install pandoc, but it isn’t added here. This would help lessen the burden of releasing a new version of pandoc by the core dev., and also provide more freedom to what other package managers one would hope to provide support on installing pandoc.
There’s still a way to make it official however. Recently we’ve been able to secure the namespace of pandoc as a GitHub Org. So potentially a repo can be opened there, providing alternative builds like Docker and AppImage, and maintained by some ones else.
What do you think?
While I'm not voting for nor against having an official Docker repo here, I just wanted to point out that pandoc
has so many extra pieces (latex for tex and pdf support, node, phython, ruby etc. for filters...) that there could be many different combinations depending on the user needs.
Of course there could be a kitchen-sink edition will all inside, but I tend to steer away from multi-GB images.
Looking at @jagregory 's image you can see that simply adding latex makes for quite a big image. And I believe that including latex is pretty basic if you need it, but that's the reason I made my own docker images (inspired by his) without it, for use cases that do not involve latex directly.
Maybe we could have some "official" combinations of pandoc + pieces publishing multiple docker images tailored to different needs à la maven for instance
Regarding the size of LaTeX: note that I’m not referring to any particular solution but LaTeX in general: the proper way of installing LaTeX is to start with the minimal installation and then use tlmgr to install necessary dependencies (like those mentioned in pandoc's Manual).
One problem with this is that the end user might need some other packages, where some of them can be as simple as fonts that might be triggered by some setup through the YAML (i.e. not only when the end user particularly add usepackage
in the header-include
). So a minimal pandoc Docker setup can at best provided as a template for people to modify to their needs.
By the way, many of the provided LaTeX through the distro’s own package manager has an oversized LaTeX dist without tlmgr.
Thanks for the explanation! I'd love to have a "smallish" docker image with latex support. It would be great to see where your approach would take us.
I think the simple (even if bloated image size) DockerImage provided by @jagregory is a good way to start.
I'm just looking at the pandoc for converting markdown to LatTeX PDF. But I would never install LaTeX to my Windows 10 machine, I've tried that before, and maintaining a working LaTeX installation is just pain in long run.
However with Docker for Windows I only need one line:
docker run jagregory/pandoc
And I know I have working installation with LaTeX support. Nothing can beat a simple installation and 100% clean uninstallation.
P.S. also maintaining official Docker image with simple yet exhaustive image is kind of recipe how to install the pandoc in full. This could be a good way to also test that installation process works, if the image build and runs normally.
Maybe even maintaining as simple image as this:
FROM ubuntu:rolling
MAINTAINER Pandoc
RUN apt-get update && apt-get install -y -o Acquire::Retries=10 --no-install-recommends \
wget
RUN mkdir -p /installation/ && \
wget https://github.com/jgm/pandoc/releases/download/2.1.1/pandoc-2.1.1-1-amd64.deb \
--no-check-certificate \
-O /installation/pandoc.deb
RUN dpkg -i /installation/pandoc.deb && rm -rf /installation/
WORKDIR /source
ENTRYPOINT ["/usr/bin/pandoc"]
CMD ["--help"]
Would make a difference, push & pull of that image takes probably a minute or so. But it would allow us to inherit from that image and add extra stuff like LaTeX etc.
I chose ubuntu:rolling because on my experiments I could not get e.g. inkscape to work inside that Haskell image. Secondly the Haskell approach is too hard-core, it takes a lot of time to build and wastes a lot of resources. Using prebuilt binary is much faster of course.
I quickly found out that @jagregory's image needs a lot of work if one wants to have full LaTeX stack, with epstopdf / svg to pdf (inkscape) etc. So I decided to start work from scratch and the above is a good starting point for any pandoc images.
I ended up using pandoc to convert to .tex file and then using latexmk
to run full latex build, it's much faster this way however I haven't yet figured all the packages I need.
Jari Pennanen notifications@github.com writes:
Maybe even maintaining as simple image as this:
I don't know a lot about Docker. Can you say more about what would be involved in maintaining an image?
Do we just need to host this Dockerfile somewhere, or is there more to it?
@jgm Jes, that's is the most important.
But it would also be good to publish the docker image into the public docker repository.
Basically, it is about running 2 commands each time the docker file is updated, to build and publish the container, so every user can use it without rebuilding.
You don't even have to run a command, just push to repo is enough. The Docker Hub builds it automatically.
Btw, I found this very nice image with full CTAN installed: https://github.com/sumandoc/TeXLive-2017 (the image url https://hub.docker.com/r/sumdoc/texlive-2017/)
If one needs latex, that is the greatest image. It automatically builds every two days, and it has full CTAN in it.
I just made own image based on that one. Installing pandoc is least of the problems, that latex thing is difficult. I have now this image:
FROM sumdoc/texlive-2017
MAINTAINER Ciantic
# Basic and usefull stuff
RUN apt-get update && apt-get install -y -o Acquire::Retries=10 --no-install-recommends \
build-essential make inkscape wget python-pip python-setuptools python-dev && \
python -m pip install --upgrade pip && \
python -m pip install --upgrade setuptools && \
mkdir -p /installation/ && \
wget https://github.com/jgm/pandoc/releases/download/2.1.1/pandoc-2.1.1-1-amd64.deb \
--no-check-certificate \
-O /installation/pandoc.deb && \
dpkg -i /installation/pandoc.deb && \
rm -rf /installation/ && \
pip install wheel && \
pip install pandoc-fignos
WORKDIR /source
ENTRYPOINT []
CMD []
It takes 3 GB of course :) but LaTeX is what it is.
Hey - looking for this too, seems good timing.
@jgm Not terribly verse either, but presumably you would sign up to https://hub.docker.com and create a repository. Official
accounts usually create an Organisation
. I would imagine there's a new walkthrough. on the docker hub side.
One possible alternatives to TeXLive would be https://tectonic-typesetting.github.io/en-US/.
It is 1 year new, uses xetex (so no luatex), and auto-download required packages, which could reduce the image size.
But, I don’t know how practical it is to use with docker. It might download new packages every time you run it. But I think it would be very useful for CI since you got to download everything per instance anyway.
@jgm The effort in maintenance for an official Docker image is marginally more than just making sure that things build correctly. It's just a file to describe an environment.
I've got a Dockerfile working but it's on Debian Jessie and takes a dog's age to build. I'd love an Alpine image that didn't have to build latex packages. Not sure how possible that is. Perhaps we could share a few Dockerfile examples to get the conversation about what's ideal rolling?
I use a Dockerfile to build the static binary for the releases. It's in the linux/ subdirectory.
Josh notifications@github.com writes:
@jgm The effort in maintenance for an official Docker image is marginally more than just making sure that things build correctly. It's just a file to describe an environment.
I've got a Dockerfile working but it's on Debian Jessie and takes a dog's age to build. I'd love an Alpine image that didn't have to build latex packages. Not sure how possible that is. Perhaps we could share a few Dockerfile examples to get the conversation about what's ideal rolling?
-- You are receiving this because you were mentioned. Reply to this email directly or view it on GitHub: https://github.com/jgm/pandoc/issues/3399#issuecomment-407888707
I'm not finished testing these images, but what are people's thoughts of these images as a starting point?
The "base" image is 130MB on the cloud, expands to ~400MB on disk. The "extras" (xetex + biblatex really) is 180MB download, expands to ~500MB on disk. Frankly, the difference is small enough that I think reducing it to one image might be a better decision. Then it could just be pandoc-tex:2.5
, and when 2.6 comes out re-build for pandoc-tex:2.6
, aka have the docker tags follow pandoc versions.
I really don't know much about docker. Like what my image should be doing at the end for like WORKDIR
or VOLUME
and stuff. But thought I would share the prototype while I continue testing. It would be interesting if there were a way to run (a portion of) the pandoc test suite in some way without doing ghc. Or maybe just run the lua-filters or some other tertiary repo testing suite?
This looks quite nice @svenevs. I've been dabbling with Docker the last days, with the goal of providing a good pandoc base image. It requires some more work, but is usable as-is. See tarleb/dockerized-pandoc. The image is currently available as tarleb/alpine-pandoc on Docker Hub; I plan to upload a better version as pandoc/base within the next days.
I also took the freedom to create pandoc/dockerfiles and intend to move the files there. PRs welcome!
I also took a stab at making a docker image. A few months ago I finally made one that included TinyTex. See https://github.com/agusmba/pandocomatic-docker if you're interested
Woah hey yeah let's do it! Those base images are really nice, especially the whole lua / c module support thing (mine definitely doesn't support that). Here is my proposition for the layout of the base images.
FROM pandoc/alpine-base[:tag]
:tag
=> pandoc/alpine-base:latest
, which in turn is the latest stable release, so at this point in time pandoc/alpine-base:latest => pandoc/alpine-base:2.5
.:master
and :dev
base images.FROM pandoc/debian-buster-slim-base[:tag]
:latest
, :master
, and :dev
.AKA this is basically just a rename of @tarleb 's tarleb/dockerized-pandoc
. This is just a suggestion, but it opens up the way for people to create additional base distributions for people who want them, but I think alpine and debian are great / should suffice for most people (?).
From there, I get a little confused, but I think this may still be possible. The idea would be to create two additional images on top of all of these. I'm still looking up this argument stuff, but I think they could be done with a single Dockerfile
per distribution per tex scheme. So the pitch (for alpine, but would do same for debian, and others in future):
pandoc/alpine-tex-basic[:tag]
with the code that just installs the latex stuff (including xetex
and biblatex
since in reality the size differences aren't that big in comparison to just installing latex in general). The :tag
s would be the same as alpine-base
(this is the part I am researching how to do).pandoc/alpine-tinytex[:tag]
a la @agusmba , it seems like the R community would greatly appreciate these images. But I'm not very familiar with it.The idea behind (1) is to create a minimal tex install that users can then tlmgr install ...
whatever they need, but it already has all of the packages that pandoc
might use. In theory different schemes e.g., alpine-tex-full
could also be done, but I don't think it's advisable. Having just the pandoc
required packages and asking users to install their own additional dependencies seems like a "docker" way to go.
The main question becomes how maintainable is this? With the above proposed scheme, we get {alpine,debian} x {2.5,master,dev} = 6
base images, would increase to 8
when 2.6 is released. Then you have another 6 images that build on each base image for tex-basic
, and another 6 for tinytex
. Meaning it starts out with 18 images. 24 once 2.6 is released. .... this seems kind of crazy.
In short: as time goes on, I'm not sure if this strategy scales very well. Especially if a pandoc/ubuntu-base
was added (or any other).
These are my uninformed (docker n00b) thoughts. I very strongly feel that a base pandoc + minimal latex docker image should be officially maintained (and am happy to help maintain it!). I just don't know how to build out the layering scheme...
Why don't you go ahead and create a pull request over at pandoc/dockerfiles? It doesn't have to be perfect, we can improve things later on.
It's great to see the community coming up with these docker images. I'm closing this now, since pull requests and specific issues with the images should go to:
Docker make it easy to run any software everywhere without complex install.
There are already several projects trying to run pandoc, but they are not up to date:
What about maintaining an official Dockerfile here?