matthewhirschey / ddh.org

datadrivenhypothesis.org is a resource to query 100+ GB of raw biological science data to develop data-driven hypotheses
3 stars 7 forks source link

simplify report generation code #104

Closed johnbradley closed 4 years ago

johnbradley commented 4 years ago

The current report generation logic repeats each Rmd filename multiple times and duplicates the logic to move the markdown to a temp directory. Streamline this logic.

For example in render_report_to_file: https://github.com/matthewhirschey/ddh/blob/25566ee751e4f6f78db6ecc5ea5d0c80037a8c8d/code/fun_reports.R#L112-L121 Then within render_complete_report() https://github.com/matthewhirschey/ddh/blob/25566ee751e4f6f78db6ecc5ea5d0c80037a8c8d/code/fun_reports.R#L77

Change the code so we only reference the Rmd file once. Have one location that makes a temp directory/copies the Rmd and renders it.

Background

When the website runs in OpenShift the /code directory is read only. This is good from a security perspective but causes problems for rmarkdown::render(). This rmarkdown::render() creates intermediate files in the same directory as the .Rmd file it is rendering. Even if we made /code writable I am concerned that two users generating PDFs at the same time would get their wires crossed. So to use rmarkdown::render() we need to copy the .Rmd files to a temp directory first.

matthewhirschey commented 4 years ago

This all makes sense, except why this only throws an error now (and not before).

Do you think we should make an explicit tmp dir or the like (perhaps report) for this intermediate file?

johnbradley commented 4 years ago

This all makes sense, except why this only throws an error now (and not before).

The error started throwing due to this change: https://github.com/matthewhirschey/ddh/commit/168032e925a5e1ecb6f91e12e4cdf2ca5caf5a33#diff-85955cebb1c904f9c10cced83f6d26c7L46

Previously we were just calling rmarkdown::render("report_depmap_app.Rmd", output_file = file) After the above change it switched to rmarkdown::render(here::here("code", "report_depmap_app.Rmd"), output_file = file)

Do you think we should make an explicit tmp dir or the like (perhaps report) for this intermediate file?

I think rmarkdown::render needs to be run in a unique directory name to avoid potential collisions. For example if two users try to run the report at the same time they both might try to write to the file named report_app.knit.md and end up with mixed results. IMO: This is lazy programming in rmarkdown it shouldn't need us to do all this work. We could create unique directories under tmp or report if having a known location to check would be helpful.

matthewhirschey commented 4 years ago

I see. It seems like a good solution could be to make a directory called report. We can then make this directory writable. Then contained with in this directory, we could make subdirectories that have unique names, to prevent the possible problems that you outlined. Having the report dir seems like it’d be easier to handle permissions.