stateofindiasbirds / soib_2023

SoIB version 2 code
MIT License
1 stars 5 forks source link

Optimise `expandbyspecies()` runtime #16

Closed rikudoukarthik closed 4 months ago

rikudoukarthik commented 4 months ago

Much of the current code is dplyr-based, and is therefore not optimal in terms of runtime. The expandbyspecies() call within singlespeciesrun(), which is iterated for each species and mask contributes considerably to the runtime, taking around 50% of the time that each glmer() call takes.

The bulk of this time is taken up by the group_by() %>% slice() %>% ungroup() step. Alternatives were explored, using both Julia and data.table/dtplyr, in which the latter performed best and results in significant runtime savings. See this for specific details and benchmark results.

The current expandbyspecies() is to be replaced by the new data.table/dtplyr--based function.

rikudoukarthik commented 4 months ago

Replaced with expand_dt() but haven't removed previous function entirely. Useful to keep for now in case troubleshooting is required. Previous function to be retired after current annual update, in time for next.