metrumresearchgroup / review

helpful tools for organizing and performing quality control (QC) tasks
https://metrumresearchgroup.github.io/review/
2 stars 0 forks source link

renderQCsummary can take a very long time #79

Closed kylebaron closed 3 weeks ago

kylebaron commented 2 months ago

This was a repository where we just did sse. It's currently taking quite a long time to generate the summary. I don't know if it's actually going to take 30 min for this to finish, but it's looking about that way. It's possible this is taking a long time because we need to go get the files from S3 (warmup effect) but thinking this could just be lots of files.

Screenshot 2024-04-01 at 08 42 14
andersone1 commented 2 months ago

@kylebaron which version of review are you using?

kylebaron commented 2 months ago

@andersone1

> packageVersion("review")
[1] ‘3.7.0’
kylebaron commented 2 months ago

There were 816 pages in the rendered report.

andersone1 commented 2 months ago

@kylebaron Wow, OK.

3.7.0 is the newest version that runs faster that the prior versions.

Would it have helped if a specific directory could have been pointed to rather than the whole project? (e.g. /script)

kylebaron commented 2 months ago

@andersone1 - yeah, either that way or some directories to ignore.

andersone1 commented 2 months ago

Hey @kylebaron - if you have time, can you run the underlying (also exported) function on the project:

dirSummary()

Edit - The reason being, I want to see what % of the time comes from that and what % comes from the rendering of the PDF (roughly) - dirSummary() returns a list, which would be useful to save as an R object for the purpose of this test

kylebaron commented 2 months ago

@andersone1 - it's dirSummary(), not knitting the pdf.

andersone1 commented 2 months ago

@kylebaron The functions dirSummary and renderQCSummary will have a .dirs_exclude option in the upcoming release.

We will also print each directory / the number of files, so the user can see which dirs they may want to exclude, e.g.:

image
kylebaron commented 2 months ago

@andersone1 - did you ever look at svn list -v on a directory to get the last revision?

andersone1 commented 2 months ago

@kylebaron

I have not tried svn list -v, but I will look into that (we currently use svn info).

Also, as FYI, we have just released a new version of the package (3.8.1), with two updates to renderQCsummary:

1. The files that will be added to the report are printed to the console (so the user has an idea how many there are), which looks like this:

image

2. The user can now specify directories to exclude

For example, in the above case, the user could exclude script/e-appendix from the report with renderQCsummary(.dirs_exclude = c("script/e-appendix"))

This version of review is available here: - review: https://s3.amazonaws.com/mpn.metworx.dev/releases/review/3.8.1

andersone1 commented 2 months ago

@michaelmcd18

Something we can look into in the future.

This command review:::svnCommand("list", .flags = "--depth infinity") could potentially replace this loop:

for (i in 1:n_iter) {
    log.i <- tryCatch(
      svnInfo(relevant_files_df$file[i]),
      error = identity
    )
}
andersone1 commented 3 weeks ago

@kylebaron

svn list implemented here (runs much faster now).