jupyter / nbconvert

Jupyter Notebook Conversion
https://nbconvert.readthedocs.io/
BSD 3-Clause "New" or "Revised" License
1.73k stars 565 forks source link

HTML/slides export without CDNs #754

Open takluyver opened 6 years ago

takluyver commented 6 years ago

Our default HTML export currently references cdnjs for require, jquery and mathjax, and unpkg for widget js (if widgets are used). As of #732, exported slides also pull in reveal.js from from cdnjs.

Relying on CDNs is convenient, but it means that the exported files won't fully work without internet access, and it may raise privacy concerns (your browser will send a referer header to the CDNs, and even with https it's plausible that someone watching your network connection could estimate when you're opening a notebook).

We should ensure there's an easy way to either refer to local files for these resources (and to get the right files in the right places), or to inline them into the exported HTML (making a big but standalone file), or both. This doesn't need to be the default, but there should be an easy way to get it.

(Of course, we probably can't do anything about HTML output pulling in resources from CDNs, but I think that making it possible for the resources our templates use would still be useful)

mpacer commented 6 years ago

I think we'll need to make mathjax more easily configurable too.

Otherwise I think all of the other libraries can be pointed to a local copy today (using traitlets in the SlidesExporter).

takluyver commented 6 years ago

All the ones in the slide template can, but the ones in the regular HTML template are hardcoded pointing to CDNs at the moment: https://github.com/jupyter/nbconvert/blob/626c82afa03f9ca28ac65d34f430fb26449eeb06/nbconvert/templates/html/full.tpl#L15-L19

My focus here is that it should be easier to use this option, though. At present, you have to download the necessary files and override several separate URLs in traitlets. I'd like to be able to do something like:

jupyter nbconvert --to html --standalone ...
# or
jupyter nbconvert --to standalone-html ...
# or
jupyter nbconvert --to html ...
bundle-resources foo.html
holdenweb commented 6 years ago

I would observe that the Jupyter distribution appears to contain all these components, and indeed they are locally served to my browser by the Jupyter notebook server. Would it be possible to extract these components to include with the distributed slides? I'm assuming that anyone wanting to convert notebooks will have them available.

I'm therefore assuming that I am missing something.

takluyver commented 6 years ago

Hi Steve :-). You're mostly right, although it's possible to have nbconvert installed without notebook (the package which includes the Javascript libraries). There's nothing technically all that tricky about this, it's "just a matter of implementation", plus a few decisions (e.g. do we go for single-file standalone, or put resources in adjacent files, or in a subdirectory).

holdenweb commented 6 years ago

Thanks. I'm sure the team is well capable of making the right decisions.

damianavila commented 6 years ago

jupyter nbconvert --to html --standalone

I like the idea.

do we go for single-file standalone, or put resources in adjacent files, or in a subdirectory).

Can we probably support both? I can see pros and cons for each option.

takluyver commented 6 years ago

Both probably makes sense. I guess that it will be easier to adapt the code for js/css in separate files rather than bundled inline.

holdenweb commented 6 years ago

Might I suggest three options:

--to slides --3rdparty-embedded
--to slides --3rdparty-subdirectory
--to slides --3rdparty-server

though I'm sure someone can think of a better string than 3rdparty. I did think about three different --slides-* destinations, but decided it was better to factor out the slide type.

mpacer commented 6 years ago

@holdenweb

What would --3rdparty-server be as you're thinking about it? Would that be equivalent to --post serve?

mpacer commented 6 years ago

I agree this should be a separate command line flag but i'd prefer it to be a flag that we assign a value to not separate flags.

I think we should only be thinking about the embedded and subdirectory versions, and I'd prefer to describe this in terms of the use case (e.g., --standalone or --offline) rather than describing what would happen (as that kind of a description tends to confuse novice users that don't need to understand exactly what is going on in order to use the feature).

damianavila commented 6 years ago

What would --3rdparty-server be as you're thinking about it? Would that be equivalent to --post serve?

I think he is thinking about getting the needed js and css files from the notebook package.

mpacer commented 6 years ago

I don't think we should offer that functionality unless we explicitly surface them from the notebook package or via a server endpoint (in which case it would require a running server that you point to).

In part I'm against this because there might be some kind of a version mismatch between the notebook js packages and the ones expected by nbconvert. If someone has an old version of the notebook server (or an old version of nbconvert) this is likely to result in really difficult-to-debug issues.

So if we were going to do that, I'd want us to be really careful. I don't think that should be a necessary piece for us to get this general functionality.

damianavila commented 6 years ago

In part I'm against this because there might be some kind of a version mismatch between the notebook js packages and the ones expected by nbconvert. If someone has an old version of the notebook server (or an old version of nbconvert) this is likely to result in really difficult-to-debug issues.

I agree... I would stay with these two options:

--to slides --embedded --to slides --subdirectory

mpacer commented 6 years ago

I think it should be one flag with two options, not two flags: I don't know what the flag should be, could be offline

--to slides --offline embed --to slides --offline subdirectory

or it could be standalone:

--to slides --standalone embed --to slides --standalone subdirectory

This will make it easier for people to recognise that the other way of doing it exists in the first place. Also it allows us to easily extend it later if we did want to add other ways of doing this (e.g., with the existing notebook server).

holdenweb commented 6 years ago

In the light of the above, I confess it makes no sense to use files from a local Jupyter server when you could just as easily serve the notebook as slides by uploading the notebook. So yes, only two slides variations. I further agree that it is not sensible to name options according to software architecture, since this is meaningless to most users.

@mpacer, do I understand you to propose --offline and --standalone as alternate names for the same option?

In which case would the alternatives

--to slide --usage standalone
--to slide --usage internet

be sufficiently user-friendly? The documentation could point out that standalone was always safe but large, while internet was smaller but wouldn't work without Internet connectivity.

Do we think that would explain sufficiently well without dragging the guts of the software to the users' attention?

mpacer commented 6 years ago

I was suggesting either --offline or --standalone as the flag name (I'm not sure which one is preferable), and it would accept two values embed or subdirectory.

The default for slides should remain unchanged (existing code shouldn't break). All of the new functionality would fall inside this new CLI and the embed value would create a big file, the subdirectory value would create a subdirectory structure.

And I'm now realising we should have a 3rd value zip, which would use the subdirectory structure but would zip up the file so that it could be transported easily as a single file that would unpack into the subdirectory structure.

@holdenweb Does that make more sense? That way we wouldn't need to drag users through the guts of anything — all they need to do is decide "do I want to use this offline?" and if they say yes then they need to decide "do I want one big html file, a subdirectory, or a zip file that'll unpack into the subdirectory".

mpacer commented 6 years ago

In the light of the above, I confess it makes no sense to use files from a local Jupyter server when you could just as easily serve the notebook as slides by uploading the notebook.

Actually I think it could make perfect sense to do that, suppose you had a file in one location and server running elsewhere. You might want to not move that file to be able to serve it as slides.

My concern is that we might have bugs from mismatched versions if we don't think about how to do that really carefully.

For example, I wouldn't want to reach into the notebook package on the file system and try to find the files that way.

If the notebook package were to expose them explicitly via a python API, I'd be more amenable… but I'm not sure that I want the notebook package including that on the python side.

If it would only work with a live server, I think it'd make more sense for the notebook to explicitly create an REST endpoint for providing these resources that could be hit by nbconvert. That way we could make sure to pass the relevant version info through with the response.

In either case those are changes to notebook that would need to occur before we could rely on them in nbconvert and I am probably not going to lead the charge on that one.

holdenweb commented 6 years ago

I agree there's a significant amount of engineering to be done to provide a pleasant user experience in creation and consumption of slide notebooks, and serious design is needed. We can at least hope these discussions will be of use to the eventual implementer, should one come along.