Closed dpastoor closed 2 years ago
In digging around this looks like it aligned with the findings at https://github.com/rstudio/packrat/issues/347
@dpastoor - the changes in https://github.com/rstudio/packrat/pull/615 dramatically reduced the cost of getPackageRecords
. Those changes were included in the packrat 0.6.0 release.
Are you able to re-test the performance of dependency resolution using the current CRAN release?
Hi Aron,
If it would be helpful, I can certainly retest, however I can confirm that the use of rsconnect to deploy apps, and the packrat invocations underneath have been much much faster in recent history, to the point that I forgot about even filing this long ago :-)
@dpastoor I'll go ahead and close this issue given your feedback. No need to retest.
@trestletech as we spoke about briefly at Rstudio Conf, here is an example of looking at the flamegraphs and data associated with a snapshot for a pretty vanilla data science project with shiny (eg tidyverse + shiny and a couple helper packages)
You can see that the snapshot call takes almost 2 minutes:
Ironically, the callHook, 49 of the 50 seconds is likewise stuck on another snapshotImpl call.
So nearly 97 of 99 seconds is taken just doing dependency reconciliation.
If we drill further into the snapshotImpl method, we can also see that the cause is a recursive call for getting package dependencies, which I believe during unraveling is re-solving the dependency graph and reading the lockfile many times.
This is also very evident from looking at how spiky the flame graph is
@slopp this is also why we struggle with using programmatic deployment from rsconnect and friends using the traditional bundling, as in https://github.com/rstudio/rsconnect/blob/master/R/bundle.R#L780 can take so long.
I would think by using either memoization or a separate environment as a hash table of sorts to check against before solving for a particular package would be very helpful.
IMO, this should not take more than a second or two max, which I lightly confirmed with an extremely hacky script that replicated the general gist of what is going on during the getPackageRecords call. I'm happy to help get further along with this, but figured it might be worth having a dialog around whether its worth it or if it makes sense to channel these optimization efforts elsewhere.