r-lib / styler

Non-invasive pretty printing of R code
https://styler.r-lib.org
Other
725 stars 72 forks source link

Styler is slow #558

Open rillig opened 5 years ago

rillig commented 5 years ago

Formatting a real-life example file takes 17 seconds on my computer. That's really much because the files is just 1215 lines. Styling this file should take at most 1 second.

lorenzwalthert commented 5 years ago

Yes, but you get the follwing styled right compared to the fast formatter formatR:

We also have features such as:

We are working on various features such as:

Often there is the trade-off between:

We choose an approach that is quite elegant, flexible and customizable (read more here). As of v1.3. , we can cache on top-level expression level which should mitigate the problem, at least for repeated styling.

Now the ad block is over :-)


Indeed you are right, styler is not fast. Although it used to be twice as slow (#78), it's also bothering me. To identify more bottlenecks, I think we'd need to:

We need to distinguish:

I can't say much more about that right now, unfortunately. If you want to contribute to any of the above or you have other suggestions, please let me know.

~I think we should also establish benchmarking with CI services to understand the speed implications of new features more holistically.~

We use {touchstone} to benchmark every PR: https://github.com/lorenzwalthert/touchstone

lorenzwalthert commented 4 years ago

Anyone interested in doing speed profiling with the new proffr package?

lorenzwalthert commented 4 years ago

578 is now ready for testing (and looking for feedback). Styling your real world file form above took my notebook 40s (with some other tasks going on) in the first round. Then, I made 14 changes to the file (in like 10 different top-level expressions, you have about 190 top-level expressions, including 32 comments) and restyled. It took 10s. For all untouched top-level expressions, the cache was used. Assuming you run styler and change only a few top level expressions yields significant speed boosts. I agree 10s is still a lot but it's 4x faster than before. 🎉

Edit: We released v1.3.2 on CRAN that contains #578 and some fixes required later.

y9c commented 4 years ago

@lorenzwalthert

Thanks for the improvement, but the latest version of styler is still quite slow...

$ R --version
R version 3.6.3 (2020-02-29) -- "Holding the Windsock"

$ R -e 'packageVersion("styler")'
[1] ‘1.3.2’

$ wc -l test.R
1243

$ time cat test.R | R --slave --no-restore --no-save -e "con <- file(\"stdin\");styler::style_text(readLines(con));close(con)" > /dev/null
cat test.R  0.00s user 0.00s system 75% cpu 0.001 total
R --slave --no-restore --no-save -e  > /dev/null  16.71s user 0.08s system 99% cpu 16.817 total

It is very similar with this issue, I still take ~17s for a single file.

lorenzwalthert commented 4 years ago

Yes. This is why this issue is still open. Caching only improves speed on repeated styling.

y9c commented 4 years ago

I don't know much about the principles of styler, but I wonder if adding asynchronous programming can help, which is widely used in the formatter of other languages.

lorenzwalthert commented 4 years ago

Can you elaborate a bit on that?

y9c commented 4 years ago

I suppose most of the functions in styler don't need to run sequentially, thus wrapping those functions by promises package in R might bring some improvement. Just in the same way as the improved shiny app. This may not fit the styler, I may be wrong...

lorenzwalthert commented 4 years ago

We can parallelize on different levels. The most outer way is over files, as in #617. Indeed we could also parallelize over expressions at some stage but this was not a priority because I worked on caching.

rillig commented 4 years ago

Instead of tweaking the implementation details, I wonder whether the algorithms that are used here are appropriate. Using parallelism will not reduce the overall CPU time, choosing a different algorithm may.

Is it possible to use a simpler formatting algorithm for the easy cases and resort to the current algorithm in the really complicated cases? In other programming languages, formatting can be done in O(n).

lorenzwalthert commented 4 years ago

What do you mean exactly by "the algorithm"? How the core of styler works is probably best documented in this vignette. I think it would be a lot of work to change the core of styler, because that would be a fundamental change and for me, the advantages of the approach taken are significant (described above). But if you have suggestions, I am open to discuss them. I don't even know if styler is O(n) (depending on how you measure n here I think it might be), I guess we could do a little empirical investigation to figure that out...

I don't see why parallelism would not reduce overall CPU time in any case, #370 certainly in many cases because additional start-up time is made up for with parallelization. Maybe it does not help when styling a single file, I agree.