jupyterhub / repo2docker

Turn repositories into Jupyter-enabled Docker images
https://repo2docker.readthedocs.io
BSD 3-Clause "New" or "Revised" License
1.62k stars 362 forks source link

Should repo2docker builds come with dependencies to export to PDF? #1089

Open willingc opened 6 years ago

willingc commented 6 years ago

When troubleshooting https://github.com/jupyterhub/jupyterhub/issues/1572, I tried to download a notebook as a PDF. I received the following error:

screenshot 2017-12-11 11 30 46

Since pandoc is used by nbconvert, shouldn't it be installed in the default image?

yuvipanda commented 6 years ago

I think part of the problem is that installing pandoc increases the size of the base image significantly (by more than a gig or so I think?), also slowing down builds and launches by quite a bit. I think right now you need an 'apt.txt' with 'pandoc' in it to have it installed. I'd personally like to keep it that way for now, since IMO slowing down the experience for everyone is not worth the extra step for folks who want to use pandoc.

We could possibly try to grep logs we have to see how many people have tried to download a notebook as PDF on binder, to help quantify this decision?

choldgraf commented 6 years ago

Is this use-case common enough that we should document it in binder-examples?

willingc commented 6 years ago

Hmm... the difficulty is that anything that relies on nbconvert in the classic notebook UI (primarily downloads) will run across this error. I think documenting solves one problem: getting it to work so +1 to that. The bigger problem is that the classic notebook's UI for download will not work for PDF or other formats relying on nbconvert. At minimum, we should add this to some sort of "known issues" doc.

yuvipanda commented 6 years ago

+1 on adding it to binder-examples as a minimum start. Should we start a FAQ for a 'known issues' type document?

willingc commented 6 years ago

I think that for mybinder.org-deploy a 'known issues' type of thing (even if just an issue) would be helpful.

choldgraf commented 6 years ago

I also just noticed that pandoc is installed by default with a conda environment (I think, anyway)

willingc commented 6 years ago

Thanks for the detective work @choldgraf. I added an environment.yml to the my test repo (willingc/ThinkDSP) that I'm using that matches the contents of the requirements.txt. Interestingly, the error is a bit different since it's referencing xelatex. Thoughts?

screenshot 2017-12-12 20 48 15
willingc commented 6 years ago

As a workaround, print preview does work within the notebook with conda and the preview can be saved as a PDF.

betatim commented 6 years ago

Is it feasible for us to disable menu items? Then at least we prevent people from getting a 500 and maybe they come looking for docs as to why the "save as PDF" menu item is missing?

:+1: on not making the image bigger if we are correct with our assumption that not very many people try to "export as ..."

choldgraf commented 6 years ago

Could we update this issue with our actionable next-step on this one? To me it seems like:

  1. Adding an example for how to get nbconvert working in binder-examples
  2. Document this behavior in the docs either way
  3. Look into disabling this button per @betatim 's suggestion (this one feels more long-term)
willingc commented 5 years ago

Bump. This has come up again at a GW workshop.

betatim commented 5 years ago

I think the next steps are:

I don't think we should add a full Latex distribution to our default image because it increases the size too much.

minrk commented 5 years ago

report this as a bug in Jupyter notebook

This would be an nbconvert issue, I think, since that's where it decides what outputs are available or not. However, removing a menu item might be more confusing than the current informative error message. I suspect will instead get users saying "Why did the download as PDF button disappear?" with no info for the user, rather than a specific error message telling them exactly what's missing, which is what they get right now.

contribute to the notebook so that "download as notebook"

I don't think "download as notebook" uses nbconvert.

create a binder-example that shows which dependencies need to be installed for nbconvert to successfully use pandoc which uses Latex to convert a notebook to PDF

👍 . Any conda-installed notebook should have pandoc as a dependency (which means all images now), which is used for all formats other than html. It is only PDF at this point that requires the extra layer of latex that might not be present.

I don't think we should add a full Latex distribution to our default image

👍

betatim commented 5 years ago

contribute to the notebook so that "download as notebook"

I don't think "download as notebook" uses nbconvert.

Then I don't understand why I get a new tab with an error message as well as the notebook when I click download as notebook :-/

Otherwise 👍 to your comments.

minrk commented 5 years ago

Then I don't understand why I get a new tab with an error message as well as the notebook when I click download as notebook :-/

Neither do I, but that's definitely a bug somewhere :). When do you get this error and what is the error that you see?

choldgraf commented 5 years ago

I agree that the end result of this should be "if a user attempts and fails at 'download as pdf', we catch it and give them a link to instructions for how to enable this"

minrk commented 5 years ago

@choldgraf right now, the behavior is an error message with a URL pointing to instructions for installing tex from the nbconvert docs. Is that requirement not satisfied, then?

betatim commented 5 years ago

Launching https://mybinder.org/v2/gh/binder-examples/requirements/master, opening the index.ipynb, File -> Download as -> ipynb I can't reproduce the error message anymore. I now get a download dialogue and two new empty tabs being opened :-/

screen shot 2018-12-22 at 10 07 31

This is with Firefox 65.0b4.

choldgraf commented 5 years ago

I think that repo might not be the greatest to test this out with since it had been last-built in july. I just pushed a tiny commit to re-trigger a build, and I now get @minrk 's error! @willingc @betatim is this now your experience on that repo?

betatim commented 5 years ago

When i click "download as notebook" I still get the behaviour I described in https://github.com/jupyterhub/binderhub/issues/341#issuecomment-449556807. This is a different problem from what happens if you click "download as PDF".

choldgraf commented 5 years ago

A quick question: @yuvipanda mentioned that pandoc adds like 1GB to the base image...I'm wondering where that's coming from. I was looking into the Pandoc bindaries, and they're somewhere around like 10-50mb, nothing close to the 1GB. Does it have extra dependencies somewhere?

Specifically, I wonder if pandoc downloads a distribution of Latex (which would certainly add some cruft to the base image). If that's the thing that's causing the big images, what if we tried using weasyprint instead of latex for the PDF creation? https://pandoc.org/MANUAL.html#creating-a-pdf

Maybe this would require a change in nbconvert, but it might be a bit simpler now that pandoc supports .ipynb formats

willingc commented 5 years ago

I'm not sure which image you are using now. Perhaps this Dockerfile would be smaller: https://github.com/pandoc/dockerfiles

Reference to recent issue activity on pandoc repo

manics commented 3 years ago

Is there something we still need to do here?

choldgraf commented 3 years ago

I think it depends on whether we think that PDF export via Latex / Pandoc should be in the default environment of Binder. Trying to export a notebook as PDF from a Binder session just now led to this error:

image

But it also seems reasonable to tell users that if they want people to export via latex, they need to explicitly install in the environment. I think the problem here is that the "export as PDF" option is available in default Binder, even though it doesn't work

manics commented 3 years ago

I'll move this to repo2docker

consideRatio commented 2 years ago

JupyterLab doesn't provide an export to PDF menu item, but the classical notebook interface does and still errors like described above.

My take is that we shouldn't add support for this functionality by default.

I suggest that if we don't propose an action point in a month or two, we can close this issue on next issue triage round.