Dask and CUDA support for first and last reductions

first and last reductions are not implemented for dask or CUDA. This is because both work in parallel and we have no control over the order of running, hence up until now we have had no way of determining what is first and last at the point when we need to combine data from different dask partitions or different CUDA threads.

However, there is a way forward by reusing the virtual row index of the recently added where reduction (#1164). This allows each row of a DataFrame to know its row index, so if this information carries through the pipeline until the point at which we, for example, combine the results from separate dask partitions then first and last can be calculated correctly.

In pseudocode, consider that

ds.first("some_column")

is equivalent to

ds.where(ds.min(virtual_row_index), "some_column")

It isn't quite as simple as that as we need to use the where reduction the other way round from normal, i.e. using the virtual row index to determine what values of "some_column" to return when in fact where so far can only use the values of "some_column" to determine what virtual row index to return. But the existing machinery can be generalised for reuse here.

This will need #1177 implemented first for CUDA support.

holoviz / datashader

Dask and CUDA support for first and last reductions #1182