holoviz / datashader

Quickly and accurately render even the largest data.
http://datashader.org
BSD 3-Clause "New" or "Revised" License
3.32k stars 365 forks source link

Image problem #517

Open jbednar opened 7 years ago

jbednar commented 7 years ago

I'm not sure what the underlying problem is, but some of the Image objects generated by the transfer functions don't work as valid inputs to the other transfer functions, even though they visualize fine.

import math

import numpy as np
import pandas as pd
import holoviews as hv
import fastparquet as fp

import datashader as ds
import datashader.transfer_functions as tf
from datashader.layout import random_layout, circular_layout

np.random.seed(0)
n=100

nodes = pd.DataFrame(["node"+str(i) for i in range(n)], columns=['name'])
randomloc = random_layout(nodes,None)
circular  = circular_layout(nodes,None, uniform=False)

c1 = ds.Canvas(plot_height=100, plot_width=100, x_range=(0.0,1.0), y_range=(0.0,1.0))
c2 = ds.Canvas(plot_height=100, plot_width=100)

def nodesplot(nodes, canvas, name):
    return tf.spread(tf.shade(canvas.points(nodes, 'x','y')), px=3, name=name)

plots = (nodesplot(randomloc,c1,"Random"), 
         nodesplot(circular, c1,"Circular"))

tf.Images(*plots)

image

tf.stack(*plots)

image

plots = (nodesplot(randomloc,c2,"Random"), 
         nodesplot(circular, c2,"Circular"))

tf.Images(*plots)

image

tf.stack(*plots)
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-5-ba0a1874a866> in <module>()
----> 1 tf.stack(*plots)

~/datashader_git/datashader/transfer_functions.py in stack(*imgs, **kwargs)
    104     imgs = xr.align(*imgs, copy=False, join='outer')
    105     with np.errstate(divide='ignore', invalid='ignore'):
--> 106         out = tz.reduce(tz.flip(op), [i.data for i in imgs])
    107     return Image(out, coords=imgs[0].coords, dims=imgs[0].dims, name=name)
    108 

~/anaconda/envs/ds/lib/python3.6/site-packages/toolz/functoolz.py in __call__(self, *args, **kwargs)
    281     def __call__(self, *args, **kwargs):
    282         try:
--> 283             return self._partial(*args, **kwargs)
    284         except TypeError as exc:
    285             if self._should_curry(args, kwargs, exc):

~/anaconda/envs/ds/lib/python3.6/site-packages/toolz/functoolz.py in flip(func, a, b)
    653     [1, 2, 3]
    654     """
--> 655     return func(b, a)
    656 
    657 

TypeError: ufunc 'over' not supported for the input types, and the inputs could not be safely coerced to any supported types according to the casting rule ''safe''

Here I'm calling stack on two different images, which works fine when I specify a range that leaves a small buffer around all the points when the images are created (first case above), but fails when stacking two images that used auto-ranging instead.

The message is a bit inscrutable, so I poked around a bit, and I thought that it could have something to do with spread reaching the boundary of the array, and I thought that I was seeing xarray types of object and values of NaNs, instead of the expected type of uint32 and numeric values. But that's all hearsay, because when I actually isolated the example above, it all looks like uint32 and no NaNs, and I see the same results with and without spreading, so I'll just leave it as "the example above works in one case and not the other, and I have no idea why".

jbednar commented 7 years ago

It might be due to differing ranges between the two images, because the problem goes away when I make this temporary change that will typically force the ranges to be the same integer values:

$ git diff datashader/pandas.py
diff --git a/datashader/pandas.py b/datashader/pandas.py
index 5eb9335..1d8fc3c 100644
--- a/datashader/pandas.py
+++ b/datashader/pandas.py
@@ -17,6 +17,8 @@ def pandas_pipeline(df, schema, canvas, glyph, summary):

     x_range = canvas.x_range or glyph._compute_x_bounds(df[glyph.x].values)
     y_range = canvas.y_range or glyph._compute_y_bounds(df[glyph.y].values)
+    x_range = round(x_range[0]), round(x_range[1])
+    y_range = round(y_range[0]), round(y_range[1])

     width = canvas.plot_width
     height = canvas.plot_height
jbednar commented 7 years ago

My eventual fix for my own situation was to find the range for each of the two plots myself and use a Canvas that included that full range. So maybe the solution is (a) a better error message somewhere (as the one here appears to be entirely misleading?), and (b) a utility to autorange a series of calls to be able to have a single Canvas to hold all of them consistently?