pacific-hake / hake-assessment

:zap: :fish: Build the assessment document using latex and knitr
MIT License
13 stars 6 forks source link

Make extra-mcmc loading code much faster #972

Closed cgrandin closed 1 year ago

cgrandin commented 1 year ago

It is insanely slow when dealing with the 16,000 files from extra mcmc. For 2023 it took almost 24 hours to load the base model (without retrospectives)!

Currently they are read into a list of file contents which I think becomes super slow to parse using map() to extract each appropriate output. Not sure how else to do this though, they have to be parsed somehow and bound together

I need to use the code profiler and rewrite this code from scratch.

cgrandin commented 1 year ago

Done in f69e4f03c67459caa477a4d5f8dc3c153dd53d64

It's now hundreds of times faster, the issue was one line of code that was rbind-ing ever single row to these massive data frames, some of which have over 1,000,000 rows. This operation takes a long time because the whole data frame has to have its memory reallocated each time.

I replaced this method with creating empty matrices of the correct dimensions ahead of time, and looping through the data frames to insert rows one at a time, encoding all the non-numerical values to something numerical (mostly 8888 and 9999 codes), then inserting the rows and finally converting the whole matrix back to a data frame and decoding those encoded numercial values back to what they were. All this fun stuff happens here: https://github.com/pacific-hake/hake-assessment/blob/fd470c09404b6545ac5337ab7500fbe9e81561eb/R/extract-rep-table.R#L13

It now takes about 5 seconds to make regular RDS files, and about 4 minutes to make an RDS with extra-mcmc (without forecasts or retrospectives)