Open statquant opened 1 month ago
For HTML widgets, I need to know more about your use case and understand the bottleneck before I can tell if the problem is solvable. Without knowing the details, I think at least the widgets can be lazy-rendered, i.e., don't render them until they are scrolled into view. If the JS rendering is not the bottleneck, this won't help much. If you have hundreds of widgets on the same page, I guess lazy-rendering won't help much regarding the memory/CPU usage in the end (it only helps in the beginning).
will you be showing examples or how to reproduce something like the default
rmarkdown
output, I am thinking something with a collapsible TOC as well ?
Yes. For collapsible TOC, you need the following css/js:
---
title: "Test TOC"
output:
litedown::html_format:
meta:
css: ["@default", "@article"]
js: ["@sidenotes", "@toc-highlight"]
options:
toc: true
number_sections: true
knit: litedown:::knit
---
```{r, results='asis'}
cat(paste(
strrep('#', sample(0:3, 100, TRUE, c(.6, .1, .1, .2))), 'test',
collapse = '\n\n'
))
> I never could make use of caching satisfyingly.
Caching _is_ hard. I completely rewrote the caching system for **litedown** in `xfun::cache_exec()` (it is not tied to **litedown** but can be used in other places). It supports both `rds` and `qs`, and you can bring your own read/write methods if the built-in methods are not satisfactory (e.g., not fast enough). Loading cache is always lazy, i.e., cache files won't be read unless cached objects are to be used. If you have a large object and it is not used in an uncached chunk, the object will not be read from cache, which can save you substantial time. Caching in **litedown** still needs substantial work on documentation. Users have to understand how caching works to take full advantage of it, otherwise caching does not necessarily make the build faster.
> the only right thing is to not do anything at all when the code has not moved, very much like what jupyter is doing
I think it is simple enough to implement that. The question is whether it's the right thing to do. [Jupyter is notorious for the hidden-state problem](https://yihui.org/en/2018/09/notebook-war/#1-hidden-state-and-out-of-order-execution). I'd like to avoid that.
With the proper use of caching, I think the preview should be fast enough. Most documents should take no more than one second to build. Performance is a priority of **litedown**, and I'll try my best to make it faster as I learn more from practical use cases.
Thanks for your feedback!
Hello, thanks for coming back to me,
you need the following css/js
Thanks ! Can I suggest that sometimes in the future you show an example that would produce mostly the same styled output one would get by default in rmarkdown ?
For HTML widgets, I need to know more about your use case and understand the bottleneck before I can tell if the problem is solvable. Without knowing the details, I think at least the widgets can be lazy-rendered, i.e., don't render them until they are scrolled into view. If the JS rendering is not the bottleneck, this won't help much. If you have hundreds of widgets on the same page, I guess lazy-rendering won't help much regarding the memory/CPU usage in the end (it only helps in the beginning).
We routinely produce html reports through rmarkdown with 100 plotly graphs. None are "big" (as in 100,000 points) but they all need to be rendered (as in displayable in the browser) before the page is useable (can be looked at and scrolled). I've never seen a way to render lazily (for instance hafen/lazyrmd does not work). As I said we found a hack-workaround by saving each graph as html and showing a static image with a link to each hml file in the report. Obviously, no workaround would be amazing.
I think it is simple enough to implement that. The question is whether it's the right thing to do. Jupyter is notorious for the hidden-state problem. I'd like to avoid that.
Unfortunately for me, I strongly stand on the other side, in practice I do not think this hidden state problem is much of a problem. As you say one just has to re-execute in clean session, which is what everybody does. Adding friction in the process of research (like having to wait to re-render from scratch, be it with caching, a document) is very detrimental. And the more advanced (data-heavy) the research is the worse it will get. Schematically speaking if I was about to re-render a 70 chunks Rmd document by only eval-ing the few (1) chunk that changed and see the output updating instantly that would change my work life (same for a few of my colleagues)
With the proper use of caching, I think the preview should be fast enough. Most documents should take no more than one second to build. Performance is a priority of litedown, and I'll try my best to make it faster as I learn more from practical use cases.
Pragmatically that's orders of magnitude away from what I see, let me describe schematically what I see in practice.
In jupyter I would only execute the last chunk when ready and work incrementally
In R the equivalent workflow would require to render several times and that's too slow even if: I always render within the current environment. I created a hook that will simply not execute a chunk if a given list of objects already exist in the environment (typically I would test "data" in the data-chunk). I always use caching (and I cache in qs or fst format according to the class of the object)
I really think the problem is that loading the cache ends up being slow at some point and unless we can keep the output of a chunk that has not changed this cannot be solved.
I've been extremely busy recently and probably won't free up until August, so let me give you some quick answers first.
I've spent a large amount of time on the problem of state vs performance when designing litedown. It's a tricky problem, but I think the solutions that I have come up with so far should work well enough. In short, I have provided two approaches:
litedown::reactor(cache = ':memory:')
in the first code chunk, your objects will be cached in memory, which can save you substantial time because it no longer needs to read/write files. The price to pay is larger memory consumption. This approach is similar to your hook, but should be more intelligent (it can decide whether a chunk should be re-computed by testing whether its dependencies have changed).litedown::reactor(cache = TRUE)
in the first chunk to simply cache all code chunks. This should achieve the goal that you mentioned: only execute the code chunk that has been changed. However, the caching is also intelligent enough in the sense that if a chunk's dependencies have changed, the chunk's cache will be invalidated. This will avoid aforementioned jupyter's problem.I haven't tested these approaches thoroughly myself, so it's possible that they are buggy somewhere. It will be great if you can help test them. Of course, the in-memory caching only works when you keep previewing a document in the same R session, which is what litedown::roam()
does.
Re: the HTML widgets issue, it will be great if you can provide a reproducible example. I will take a closer look later.
Hello, I've been very interested to see that you're giving
rmarkdown
another look. I've red the docs and I have some questions, please let me know if this is not the place to ask for questions.You say
At work we use exclusively widgets but because we never managed to find a way to load them lazily in the browser (hence loading the document was unusably slow) we started to do the following: we use
webshot
(orwebshot2
but it's buggy) to create snapshots and we link the actual html widget file to the snapshot picture. This has been a game changer for us, can I assume this will work as well ? Happy to hear if you have a better way to do this. BTW we are actually deferring the saving the widget to html on another thread which speeds up the rendering a lot, I do not know if you expect to do things similar to this as well but this might be food for thoughts.You say
Having a pleasing output is pretty important and too raw of an output might deter people, will you be showing examples or how to reproduce something like the default
rmarkdown
output, I am thinking something with a collapsible TOC as well ?You say
This is what perplexed me the most, I never could make use of caching satisfyingly. As long as you work with medium size data (say a few million rows and a few columns) chunks that will load/update/save the data will always be too long to load. I went through to modify the rds format to swap to qs to improve the cache loading (binary format a and multi-thread reading) but that's just too slow. For the exact same reason I never could use the preview efficiently, the total re-rendering being too slow. My view is that the only right thing is to not do anything at all when the code has not moved, very much like what jupyter is doing. I believe most user will
knit
the document in session anyway so data will be there. I personally solved this by creating a hook that won't eval the chunk if the hash of the code chunk has not changed, but if there is no hook I'd be lost. Do you have a nice way to provide a way to work "a la jupyter" (as far as caching/kniting is concerned) ?Many thanks for your work I am using it everyday, to me being able to use R code in eval is still a killing feature that no other tool provides.