quarto-dev / quarto-cli

Open-source scientific and technical publishing system built on Pandoc.
https://quarto.org
Other
3.77k stars 308 forks source link

quarto preview with .ipynb - issue parsing with Javascript/CSS? (Holoviews/Bokeh) #857

Closed jules32 closed 2 years ago

jules32 commented 2 years ago

Hi!

We've been having issues with .ipynb files that have Holoviews/Bokeh - these often take a long time to quarto preview in our JupyterHub, and also with our GitHub Action. We've encountered this before and likely chatted about it but wanted to revisit –

Here are a few example notebooks where we are seeing this (included in _quarto.yml#L52-L58):

A few more details about it in this earthdata-cloud-cookbook issue, where @betolink says:

I think this may be because these notebooks use Holoviews/Bokeh. HVPlot injects a lot of Javascript and CSS into the notebook (in the order of megabytes) and perhaps Quarto is having issues parsing all of this.

Thanks in advance for any help!

jjallaire commented 2 years ago

I haven't been able to repro anything terribly unexpected here. I've downloaded these notebooks and included this _quarto.yml file:

project:
  type: website

website: 
  navbar: 
    title: "Test Notebooks"
    left: 
      - text: NB 1
        href: GESDISC_MERRA2_tavg1_2d_flx_Nx__Kerchunk.ipynb
      - text: NB 2
        href: LPDAAC_ECOSTRESS_LSTE__Kerchunk.ipynb
      - text: NB 3
        href: PODAAC_ECCO_SSH__Kerchunk.ipynb

On my laptop it takes about 9 seconds to render the site. Depending on how fast your server machine is though this could blow up quite a bit (5x slower?). In terms of preview, assuming it's fully rendered the site comes up in a second or two and each page takes about 3 seconds to render when clicked on in the browser (of course this could be considerably slower if your server is slower).

So it does seem like on a slower server machine .ipynb files of this size could be slow to preview (but definitely wouldn't be slow to serve to end users). These are ~ 10mb notebooks so this might be close to as good as it gets, but I'll do some more digging to see what's accounting for the time and whether we can do anything better.

jjallaire commented 2 years ago

Okay, in my benchmarking it only takes ~50ms to read the 10mb ipynb so the size itself is not a problem. My guess is that there is something related to regular expressions running over huge chunks of html -- hopefully this is something we can find a clean workaround for and make this go a lot faster. More soon.

jjallaire commented 2 years ago

I have made a couple of changes that will help some aspects of preview performance for large notebooks: https://github.com/quarto-dev/quarto-cli/commit/97f52c79f5a6e6861a4d45f4a1667b3a9a7013a9

The fact that ~ 10mb and larger notebooks take a while to render (~ 3 seconds on my laptop) isn't something I can see easy ways to improve. This is mostly because there are many passes over the content made (by pandoc and by quarto) and that volume just ends up taking more time. So rendering and initially previewing these notebooks will be slow.

In quarto preview we attempt to avoid re-rendering when we can. This was formerly done by checking the content hash of the input and output but I noticed that for large notebooks just building the hash could take 1.5 seconds! For notebooks we now use file modification times instead. Very slightly less robust but much faster.

We also track which notebooks we've already rendered -- we weren't however tracking notebooks rendered on startup (resulting in the potential for two renders, one during the initial pass and one when serving the preview). We now track the initial render.

Net of this is that you will always have to render each .ipynb at least one time before seeing a preview of it, and these renders can in fact be quite slow depending on the size of the ipynb and the speed of the machine. However, once rendered (and assuming the underlying ipynb doesn't change) the speed of preview should be nearly instant. For example, if you do this to start the preview:

quarto preview --render all

Then once the web browser opens up you'll get extremely fast renders of each page (this definitely wan't the case prior to the changes I made).

jjallaire commented 2 years ago

Just to clarify, I'm not suggesting that you preview with --render all (that was just for illustration). You can continue to preview exactly as you do now and .ipynb files should get rendered exactly once for each time they are changed.

jules32 commented 2 years ago

Hi @jjallaire this is super helpful, thank you for exploring this and the improvements and workflow. We will try it out in JupyterHub and share any updates :)

jules32 commented 2 years ago

Following up to say this is much faster for us now, thank you so much! Closing this issue :)