matildabrown / rWCVP

Generating Summaries, Reports and Plots from the World Checklist of Vascular Plants
https://matildabrown.github.io/rWCVP/
GNU General Public License v3.0
19 stars 0 forks source link

Rendering large checklist blows up memory #25

Closed barnabywalker closed 1 year ago

barnabywalker commented 2 years ago

I tried to render a checklist for all of Brazil using:

checklist <- generate_checklist(
  area=brazil, 
  native=TRUE,
  introduced=FALSE,
  extinct=FALSE,
  location_doubtful=FALSE,
  synonyms=FALSE,
  render.report=TRUE,
  report.dir=here::here()
)

and got this error:

Error: cannot allocate vector of size 12.3 Gb

Generating the checklist without rendering works fine, so I think the problem is somewhere in making the report. If this problem cannot be overcome, maybe we should put a warning in the documentation.

matildabrown commented 2 years ago

Hmm. I suspect the problem is that gt tables are pretty inefficient. Perhaps a warning if the number of species exceeds a certain number, or an ask argument? Suggest in the docs that if you need a checklist that big, you need to split it up by family?

matildabrown commented 2 years ago

Assuming that brazil is get_wgsrpd3_codes("Brazil") ?

matildabrown commented 2 years ago

Brazil has >30k species, so maybe a good warning threshold is 10k (or 5k?), with suggestions to split up by family or exclude synonyms - what do you think?

matildabrown commented 1 year ago

Update - just ran Greece (c.4k species, but >100k including synonyms) - worked but was sloooow to render. Noting for future work on this bug.