holoviz / datashader

Quickly and accurately render even the largest data.
http://datashader.org
BSD 3-Clause "New" or "Revised" License
3.3k stars 365 forks source link

Correctly handle RaggedArray conversions to numpy arrays #1185

Closed ianthomas23 closed 1 year ago

ianthomas23 commented 1 year ago

Fixes #1158.

This removes all warnings caused by numpy conversions of ragged arrays which will be errors in numpy 1.24. In fact there weren't any problems in the library code itself as if you follow the docstrings you will create ragged arrays correctly, but some of the tests used shortcuts instead of the recommended way and these have been changed in this PR.

Either of these are correct ways to create a DataFrame series that is a ragged array to use in datashader:

import pandas as pd
x = pd.array([[0, 1, 2], [4, 5, 6, 7, 8, 9]], dtype='Ragged[float32]')

from datashader.datatypes import RaggedArray
x = RaggedArray([[0, 1, 2], [4, 5, 6, 7, 8, 9]], dtype='float32')

The dtype is optional for RaggedArray as it is inferred.

The following worked in the past but are incorrect using numpy 1.24 onwards:

x = np.asarray([[0, 1, 2], [4, 5, 6, 7, 8, 9]])

x = np.asarray([[0, 1, 2], [4, 5, 6, 7, 8, 9]], dtype=object)

The first approach will immediately fail, telling you to use the second dtype=object approach. This works for some but not all codepaths in datashader as it drops important dtype information. Hence avoid both.

Eventually the RaggedArray pandas extension array within datashader will be replaced by awkward-array and will simplify our code and make it more robust to future changes.

codecov[bot] commented 1 year ago

Codecov Report

Merging #1185 (399b36c) into main (aed1760) will increase coverage by 0.00%. The diff coverage is 85.71%.

@@           Coverage Diff           @@
##             main    #1185   +/-   ##
=======================================
  Coverage   85.39%   85.39%           
=======================================
  Files          35       35           
  Lines        8016     8023    +7     
=======================================
+ Hits         6845     6851    +6     
- Misses       1171     1172    +1     
Impacted Files Coverage Δ
datashader/datatypes.py 93.75% <85.71%> (-0.15%) :arrow_down:

:mega: We’re building smart automated test selection to slash your CI/CD build times. Learn more

ianthomas23 commented 1 year ago

After rebasing this is passing CI.