Memory spike on write - Githubissues

Following up on a topic raised in a closed thread, a fleeting 2x spike in memory footprint has been observed following training but preceding validation. Such spikes can lead to swapping and inordinately long training times, for example, in the cases of either large data sets or wide forests. The cause is presumed to be a series of copies from the Core's STL-style vectors into the front-end's R-style vectors. If this in fact the cause, then a solution should be achieved simply by dispatching training into blocks of several trees at a time and performing the copies once per block.

The proposed Combine() method will remain on the TODO list, but separate training and subsequent combination of forests will probably not conserve memory: forest summaries comprise the bulk of the memory footprint when training even a modest number of trees.

suiji / Arborist

Memory spike on write #38