Chunked history evaluation

(thx to @i-Zaak for the idea)

Perhaps some blocking strategy, such as that used for matrix multiplications, would be advantageous: when getting from the history buffer, we can load history for N nodes (where N is just less than what fills the L3 cache) & increment the coupling vector by the contributions of those nodes, then continue with next chunk of N nodes.

maedoc / libtvb

Chunked history evaluation #94