palaeoverse / rphylopic

Get Silhouettes of Organisms from PhyloPic
https://rphylopic.palaeoverse.org
GNU General Public License v3.0
90 stars 9 forks source link

Should we temporarily cache svg files? #49

Closed willgearty closed 1 year ago

willgearty commented 1 year ago

Right now we store a single "temp.svg" file in the temporary directory. This is a part of the svg conversion process with rsvg. However, the ggplot production process often involves several iterations of tweaking plots before saving them, so perhaps it would make sense to store all of the svg files as \<uuid>.svg in the temporary directory. Then, when we go to download a phylopic, we first check whether it is in the temporary directory first, instead of downloading and converting all over again. This should generally result in notable speed ups, especially when lots of the same phylopics are being used over and over. The temporary folder is cleared whenever R is closed, so this shouldn't result in any permanent storage usage.

willgearty commented 1 year ago

Upon further investigation, it appears that readPicture() takes about 1 to 2 orders of magnitude longer than the downloading via GET() and the processing via rsvg_svg(). For the largest files (e.g., https://images.phylopic.org/images/347bc4ab-c1e0-4313-9ea1-939c2ad71232/vector.svg), it looks like the readPicture() process can take up to 20 or more seconds, while the downloading (~15 milliseconds) and processing (~90 milliseconds) take essentially no time at all. Therefore, caching more than just one processed .svg file probably won't save too much time for the average user. However, for users that are using lots of phylopics that are always the same, it could save some time (but the major bottleneck will remain the readPicture() step).

LewisAJones commented 1 year ago

Thanks for looking into this more @willgearty and providing some benchmarking. It's good to have some data to keep in mind. I think based on the numbers you've quoted, it would be best to leave it as it is currently implemented. I guess one thing to also consider is that users don't necessarily need to call the API each time... we have functions that allow you to bring the image into the R environment to avoid having to redownload each time... Maybe our efforts would be best spent on showing users to get around this issue?

willgearty commented 1 year ago

Agreed, more examples or a whole vignette would probably be better.