Consider datashading a dask dataframe containing columns of different dtypes that are not actually used in the datashade operation:
import dask.dataframe as dd
import datashader as ds
import numpy as np
import pandas as pd
df = pd.DataFrame(
data=dict(
x = [0, 1, 2],
y = [0, 1, 2],
dates = np.array(['2007-07-13', '2006-01-13', '2010-08-13'], dtype='datetime64'),
)
)
ddf = dd.from_pandas(df, npartitions=2)
canvas = ds.Canvas(2, 2)
agg = canvas.points(ddf, 'x', 'y', ds.count())
Note the dates column is not used in the canvas.points call. Running this gives the following error:
Traceback (most recent call last):
File "/Users/iant/github_temp/datashader_temp/dask_dtypes.py", line 16, in <module>
agg = canvas.points(ddf, 'x', 'y', ds.count())
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/iant/github/datashader/datashader/core.py", line 220, in points
return bypixel(source, self, glyph, agg)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/iant/github/datashader/datashader/core.py", line 1257, in bypixel
return bypixel.pipeline(source, schema, canvas, glyph, agg, antialias=antialias)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/iant/github/datashader/datashader/utils.py", line 109, in __call__
return lk[typ](head, *rest, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/iant/github/datashader/datashader/data_libraries/dask.py", line 22, in dask_pipeline
dsk, name = glyph_dispatch(glyph, df, schema, canvas, summary, antialias=antialias, cuda=cuda)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/iant/github/datashader/datashader/utils.py", line 112, in __call__
return lk[cls](head, *rest, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/iant/github/datashader/datashader/data_libraries/dask.py", line 122, in default
dtype = np.result_type(*dtypes)
^^^^^^^^^^^^^^^^^^^^^^^
File "<__array_function__ internals>", line 200, in result_type
TypeError: The DType <class 'numpy.dtype[datetime64]'> could not be promoted by <class 'numpy.dtype[int64]'>. This means that no common DType exists for the given inputs. For example they cannot be stored in a single array unless the dtype is `object`. The full list of DTypes is: (<class 'numpy.dtype[int64]'>, <class 'numpy.dtype[int64]'>, <class 'numpy.dtype[datetime64]'>)
Internally in the code that handles dask dataframes there is an attempt to find a dtype that is compatible for all columns of the dataframe. This is unnecessary, we only need to consider the x and y columns here so we can ignore the others.
Consider datashading a dask dataframe containing columns of different dtypes that are not actually used in the datashade operation:
Note the
dates
column is not used in thecanvas.points
call. Running this gives the following error:Internally in the code that handles dask dataframes there is an attempt to find a
dtype
that is compatible for all columns of the dataframe. This is unnecessary, we only need to consider thex
andy
columns here so we can ignore the others.First reported by @Hoxbro.