Optimize _find_optimal_split function.

What has been done: The commits (aca31b1ea3f76964) and (a2f504f9ccfbab8cd9) improve the speed of the inner loop (over observations) by a big margin. In the first commit I changed most np.sum() and np.mean() calls for a dynamic sum extension. In the second commit I swapped pd.DataFrame data storage for the fast np.array and now simply convert the end result to a pd.DataFrame.

What still needs to be done:

The code need to be checked for correctness against the old implementation and unit tests have to be written.
To make the code even faster it has be profiled while numba is disabled, since this allows to check what function calls make _find_optimal_split slow. Current profiling has shown that the function _find_optimal_split is still the only major concern.

timmens / causal-forest

Optimize _find_optimal_split function. #6