Open rillig opened 5 years ago
Yes, but you get the follwing styled right compared to the fast formatter formatR:
x %>% \nc()
call({{ x }})
call(!!x)
and call(!!! x)
.+
(https://github.com/yihui/formatR/issues/100).We also have features such as:
We are working on various features such as:
Often there is the trade-off between:
We choose an approach that is quite elegant, flexible and customizable (read more here). As of v1.3.
, we can cache on top-level expression level which should mitigate the problem, at least for repeated styling.
Now the ad block is over :-)
Indeed you are right, styler is not fast. Although it used to be twice as slow (#78), it's also bothering me. To identify more bottlenecks, I think we'd need to:
::
. No idea what the speed implications are, but we can spare the .onLoad()
call (compare #685).$
https://github.com/tidyverse/tibble/issues/780#issuecomment-643766314.as.character(transformers)
since we have transformer name and version (#679).parse_transform_serialize_r_block()
or parse_transform_serialize_r()
: only pass a subset of all transformers to apply_transformers()
, namely remove those who we know wont ever be applied (e.g. force_assignment_op()
must not be applied to all nests if we know there is not a single instance of EQ_ASSIGN
. This will have some speed-up for transformers that are expensive to run but hardly change any nest. Note that we should use a function to subset the transformers (e.g. subset_tranformers()
, and that function should be packed in the style guide and not be hardcoded in styler (https://github.com/r-lib/styler/pull/711).We need to distinguish:
I can't say much more about that right now, unfortunately. If you want to contribute to any of the above or you have other suggestions, please let me know.
~I think we should also establish benchmarking with CI services to understand the speed implications of new features more holistically.~
We use {touchstone} to benchmark every PR: https://github.com/lorenzwalthert/touchstone
Anyone interested in doing speed profiling with the new proffr package?
Edit: We released v1.3.2
on CRAN that contains #578 and some fixes required later.
@lorenzwalthert
Thanks for the improvement, but the latest version of styler is still quite slow...
$ R --version
R version 3.6.3 (2020-02-29) -- "Holding the Windsock"
$ R -e 'packageVersion("styler")'
[1] ‘1.3.2’
$ wc -l test.R
1243
$ time cat test.R | R --slave --no-restore --no-save -e "con <- file(\"stdin\");styler::style_text(readLines(con));close(con)" > /dev/null
cat test.R 0.00s user 0.00s system 75% cpu 0.001 total
R --slave --no-restore --no-save -e > /dev/null 16.71s user 0.08s system 99% cpu 16.817 total
It is very similar with this issue, I still take ~17s for a single file.
Yes. This is why this issue is still open. Caching only improves speed on repeated styling.
I don't know much about the principles of styler, but I wonder if adding asynchronous programming can help, which is widely used in the formatter of other languages.
Can you elaborate a bit on that?
I suppose most of the functions in styler don't need to run sequentially, thus wrapping those functions by promises package in R might bring some improvement. Just in the same way as the improved shiny app. This may not fit the styler, I may be wrong...
We can parallelize on different levels. The most outer way is over files, as in #617. Indeed we could also parallelize over expressions at some stage but this was not a priority because I worked on caching.
Instead of tweaking the implementation details, I wonder whether the algorithms that are used here are appropriate. Using parallelism will not reduce the overall CPU time, choosing a different algorithm may.
Is it possible to use a simpler formatting algorithm for the easy cases and resort to the current algorithm in the really complicated cases? In other programming languages, formatting can be done in O(n).
What do you mean exactly by "the algorithm"? How the core of styler works is probably best documented in this vignette. I think it would be a lot of work to change the core of styler, because that would be a fundamental change and for me, the advantages of the approach taken are significant (described above). But if you have suggestions, I am open to discuss them. I don't even know if styler is O(n) (depending on how you measure n here I think it might be), I guess we could do a little empirical investigation to figure that out...
I don't see why parallelism would not reduce overall CPU time in any case, #370 certainly in many cases because additional start-up time is made up for with parallelization. Maybe it does not help when styling a single file, I agree.
Formatting a real-life example file takes 17 seconds on my computer. That's really much because the files is just 1215 lines. Styling this file should take at most 1 second.