Refactor to use dask.array.blockwise

I've started the refactoring to use dask.array.blockwise, beginning with what @rabernat did in #49 .

So far all I've done is:

[x] Rewrite to at least call dask.array.blockwise, even if it doesn't yet work
[x] Write some tests for the new _bincount_kernel function (the one that will be called by dask.array.blockwise)
[x] Reorganise the existing histogram tests slightly

I have not done:

[ ] Write the actual underlying _bincount_kernel function needed to pass the new tests
[ ] Finish writing the other new tests
[ ] Checks and error handling
[ ] Make sure all the keyword args work (density, weighted)
[ ] Dask handling of bins
[ ] Expand tests to check both numpy and dask paths
[ ] Any chunked numpy kernel acceleration
[ ] Any numba/cython/bottleneck kernel acceleration
[ ] Remove all now-redundant code

I'm not totally sure if I'm understanding the proposed algorithm correctly - in the numpy code path then the bincounts.sum(axis) will only ever sum over length-1 axes, is that correct?

I'm also wondering how much of the _bincount_kernel code can or should just be copied directly from numpy/dask.histogramdd (with attribution of course)... Can we just loop over the "unused_inds" of the array with the histogramdd algorithm to preserve the dimensions we don't wish to reduce over? Perhaps using a generalized ufunc? Or would that be inefficient?

cc @dougiesquire @gjoseph92

xgcm / xhistogram

Refactor to use dask.array.blockwise #56