pharmaR / riskassessment

Risk Assessment Demo App: https://rinpharma.shinyapps.io/riskassessment
https://pharmar.github.io/riskassessment/
Other
98 stars 25 forks source link

Use {archive} for package upload #747

Closed Jeff-Thompson12 closed 3 months ago

Jeff-Thompson12 commented 5 months ago

Uses the {archive} package instead of unpacking the package tarball during the upload process.

codecov[bot] commented 3 months ago

Codecov Report

All modified and coverable lines are covered by tests :white_check_mark:

Project coverage is 79.23%. Comparing base (6bdf82d) to head (ce0f93f).

Additional details and impacted files ```diff @@ Coverage Diff @@ ## dev #747 +/- ## ========================================== + Coverage 79.20% 79.23% +0.02% ========================================== Files 33 33 Lines 5112 5114 +2 ========================================== + Hits 4049 4052 +3 + Misses 1063 1062 -1 ```

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

Jeff-Thompson12 commented 3 months ago

Hi @Jeff-Thompson12 - this PR appears to be doing what it set out to do. However, can I ask what issue this solves?

I did a speed test on dev uploading a package using upload_pkg_lst() and compared it with this branch. Uploading Tplyr on dev took 10 seconds while Tplyr on this branch took 16 seconds. But then I did another test uploading 5 pkgs: on dev, it took 62 seconds while this branch took 46 seconds. So I'm not quite sure what to think of those results, lol. These are the packages I used the second time:

upload_pkg_lst(c("Tidyr", "IDEAFilter", "tidyCDISC", "ggplot2", "xportr"))

This PR really only serves to remove the process of untarring the tarballs and eliminate the usage of the source/ directory. While this should result in a time save that is not the ultimate goal. We also have the added benefit of using less resources in the deployed application.

I did notice that the code reads and writes the tar file twice in this PR. Once during get_desc_pkg_info() and then again in insert_riskmetric_to_db(). Would it be worthwhile to only perform this action once? Or is the computational effort negligible?

I think in general our upload process could benefit from a re-write. This is not the only place we have duplication. In fact, we run riskmetric::pkg_ref() twice during the upload process and it has a much larger consequence as far as time is concerned. I think reading the DESCRIPTION file using archive is pretty negligible on the computational effort.

Jeff-Thompson12 commented 3 months ago

I made a change so that we directly read from the archive connection to create the description object. I think this is a cleaner execution of the problem.