holoviz / datashader

Quickly and accurately render even the largest data.
http://datashader.org
BSD 3-Clause "New" or "Revised" License
3.3k stars 365 forks source link

x and y range attributes on returned aggregations #1198

Closed ianthomas23 closed 1 year ago

ianthomas23 commented 1 year ago

Closes #1157.

This PR adds new attributes x_range and y_range to aggregations returned from datashader. Simple example:

import datashader as ds
import pandas as pd

df = pd.DataFrame(dict(x=[1.1, 2.2, 3.3], y=[4.4, 5.5, 6.6]))
canvas = ds.Canvas(plot_height=5, plot_width=5)
agg = canvas.points(source=df, x="x", y="y")
print(agg)

Output produced:

<xarray.DataArray (y: 5, x: 5)>
array([[1, 0, 0, 0, 0],
       [0, 0, 0, 0, 0],
       [0, 0, 1, 0, 0],
       [0, 0, 0, 0, 0],
       [0, 0, 0, 0, 1]], dtype=uint32)
Coordinates:
  * x        (x) float64 1.32 1.76 2.2 2.64 3.08
  * y        (y) float64 4.62 5.06 5.5 5.94 6.38
Attributes:
    x_range:  (1.1, 3.3)
    y_range:  (4.4, 6.6)

so the attributes can be accessed using agg.x_range and similar.

The ranges are set regardless of whether they are specified by the user in the Canvas constructor, or determined from the data limits.

For situations that return an xarray.Dataset rather than an xarray.DataArray, e.g. if a ds.summary() is used, the attributes are copied to the Dataset. Hence they are always available as attributes of the top-level object returned from Canvas aggregation functions.

codecov[bot] commented 1 year ago

Codecov Report

Merging #1198 (c6c4c3d) into main (3d2f7df) will increase coverage by 0.02%. The diff coverage is 100.00%.

@@            Coverage Diff             @@
##             main    #1198      +/-   ##
==========================================
+ Coverage   84.68%   84.70%   +0.02%     
==========================================
  Files          35       35              
  Lines        8345     8357      +12     
==========================================
+ Hits         7067     7079      +12     
  Misses       1278     1278              
Impacted Files Coverage Δ
datashader/data_libraries/pandas.py 100.00% <ø> (ø)
datashader/compiler.py 92.90% <100.00%> (+0.09%) :arrow_up:
datashader/core.py 88.38% <100.00%> (ø)
datashader/data_libraries/dask.py 95.16% <100.00%> (+0.16%) :arrow_up:
datashader/data_libraries/dask_xarray.py 98.95% <100.00%> (+0.03%) :arrow_up:

:mega: We’re building smart automated test selection to slash your CI/CD build times. Learn more

ianthomas23 commented 1 year ago

Test failures are because our CI conda environments are now using pandas 2.0.0 which is incompatible with recent xarray, so the version of xarray installed is 0.19.0 from July 2021. xarray will fix this in due course, in the mean time I will try pinning pandas < 2 to test this PR.

jbednar commented 1 year ago

Looks like the ranges are slightly larger than the coordinates would suggest. Is that just due to the size of each array cell? If so, having the ranges explicitly listed is indeed useful.

ianthomas23 commented 1 year ago

Yes, linear coordinates are equally spaced and the end coordinates are half a cell width inside the ends. The ranges are therefore easy to calculate from the coordinates. But with logarithmic axes the maths is non-trivial so it is useful for the ranges to always be available.