rstudio / packrat

Packrat is a dependency management system for R
http://rstudio.github.io/packrat/
401 stars 89 forks source link

snapshot dependency resolution behavior causes slowness #532

Closed dpastoor closed 2 years ago

dpastoor commented 5 years ago

@trestletech as we spoke about briefly at Rstudio Conf, here is an example of looking at the flamegraphs and data associated with a snapshot for a pretty vanilla data science project with shiny (eg tidyverse + shiny and a couple helper packages)

You can see that the snapshot call takes almost 2 minutes:

image

Ironically, the callHook, 49 of the 50 seconds is likewise stuck on another snapshotImpl call.

image

So nearly 97 of 99 seconds is taken just doing dependency reconciliation.

If we drill further into the snapshotImpl method, we can also see that the cause is a recursive call for getting package dependencies, which I believe during unraveling is re-solving the dependency graph and reading the lockfile many times.

image

This is also very evident from looking at how spiky the flame graph is

image

@slopp this is also why we struggle with using programmatic deployment from rsconnect and friends using the traditional bundling, as in https://github.com/rstudio/rsconnect/blob/master/R/bundle.R#L780 can take so long.

I would think by using either memoization or a separate environment as a hash table of sorts to check against before solving for a particular package would be very helpful.

IMO, this should not take more than a second or two max, which I lightly confirmed with an extremely hacky script that replicated the general gist of what is going on during the getPackageRecords call. I'm happy to help get further along with this, but figured it might be worth having a dialog around whether its worth it or if it makes sense to channel these optimization efforts elsewhere.

dpastoor commented 5 years ago

In digging around this looks like it aligned with the findings at https://github.com/rstudio/packrat/issues/347

aronatkins commented 3 years ago

@dpastoor - the changes in https://github.com/rstudio/packrat/pull/615 dramatically reduced the cost of getPackageRecords. Those changes were included in the packrat 0.6.0 release.

Are you able to re-test the performance of dependency resolution using the current CRAN release?

dpastoor commented 3 years ago

Hi Aron,

If it would be helpful, I can certainly retest, however I can confirm that the use of rsconnect to deploy apps, and the packrat invocations underneath have been much much faster in recent history, to the point that I forgot about even filing this long ago :-)

aronatkins commented 2 years ago

@dpastoor I'll go ahead and close this issue given your feedback. No need to retest.