holoviz / holoviews

With Holoviews, your data visualizes itself.
https://holoviews.org
BSD 3-Clause "New" or "Revised" License
2.7k stars 402 forks source link

HeatMap values don't match between backends #5260

Open jbednar opened 2 years ago

jbednar commented 2 years ago

HoloViews 1.14.8 (current release) shows different output depending on the backend:

import numpy as np, holoviews as hv
hv.extension('bokeh', 'matplotlib')

data = [(i, chr(97+j),  i*j) for i in range(5) for j in range(5) if i!=j]
hm = hv.HeatMap(data).sort()
image

The Bokeh version looks suspicious to me. Off by one error in indexing the y dimension?

marcbernot commented 2 years ago

The inconsistencies disappear if you define hm = hv.HeatMap(data).sort(by='y'). Does calling sort() is expected to yield the same result?

BUT, if we don't sort hm, they are some inconsistencies with the matplotlib backend.

The graphic and values are correct, but the letters of the y-axis are incorrect. Besides, the shared_axes option is not taken into account when displaying layoutin the following code (this may be tagged as another bug).

import numpy as np, holoviews as hv
import pandas as pd

hv.extension('bokeh', 'matplotlib')

data = [(i, chr(97+j),  i*j) for i in range(5) for j in range(5) if i!=j]
hm = hv.HeatMap(data)

df = pd.DataFrame(data,columns = ['x','y','val']).set_index(['y','x']).sort_index()
hm2 = hv.HeatMap(df,['x','y'])
df.unstack().sort_index(ascending=False)

The data in the dataframe df is

val
x 0 1 2 3 4
y
e 0.0 4.0 8.0 12.0 NaN
d 0.0 3.0 6.0 NaN 12.0
c 0.0 2.0 NaN 6.0 8.0
b 0.0 NaN 2.0 3.0 4.0
a NaN 0.0 0.0 0.0 0.0

The bokeh graph with a at the top that looked suspicious is consistent with the fact that hm['y'] is (array(['b', 'c', 'd', 'e', 'a', 'c', 'd', 'e', 'a', 'b', 'd', 'e', 'a', 'b', 'c', 'e', 'a', 'b', 'c', 'd'], dtype=object),

layout = (hm+hm2).opts(shared_axes = False)
hv.output(layout,backend = 'bokeh')
bokehlayout
hv.output(layout,backend = 'matplotlib')
matplotliblayout
hv.output(hm2,backend = 'matplotlib')
matplotlib
marcbernot commented 2 years ago

At the moment when element.py calls get_data(self, element, ranges, style), the error is already present since it results in out of sync data and yfactors with yfactors being ['b', 'c', 'd', 'e', 'a']. I'll try to understand and fix this (maybe my first commit :))

marcbernot commented 2 years ago

Ok, the problem is that the data is accessed through hm.gridded.data, i.e.

{'x': array([0, 1, 2, 3, 4]),
 'y': array(['a', 'b', 'c', 'd', 'e'], dtype=object),
 'z': array([[nan,  0.,  0.,  0.,  0.],
        [ 0., nan,  2.,  3.,  4.],
        [ 0.,  2., nan,  6.,  8.],
        [ 0.,  3.,  6., nan, 12.],
        [ 0.,  4.,  8., 12., nan]])}

and the y-axis is made from hm.dimension_values('y',expanded=False) which is array(['b', 'c', 'd', 'e', 'a'], dtype=object) (notice that it is not in the same order as the y value of hm.gridded.data.

Here is a smaller and troubling exemple of the discrepancy.

data_ok = [(1,'b',0),(0,'c',1),(1,'a',2)]
data_nok = [(0,'b',0),(0,'c',1),(1,'a',2)]

h_ok = hv.HeatMap(data_ok)
h_nok = hv.HeatMap(data_nok)
h_ok+h_nok
Capture d’écran 2022-04-11 à 16 53 24

Strangely, h_ok.gridded.data['y'] is ['b','c','a'] and h_nok.gridded.data['y'] is ['a','b','c'] (which results in the shuffled data since the correct indexing is not taken into account).

I see two options to fix this : 1) Making changes in `hm.gridded so that it enforces that hm.gridded.data['y'] and hm.dimension_values('y',expanded=False) are identical (this is the implicit hypothesis that was broken) 2) Making changes in get_data so as to correctly extract the data (make no implicit hypothesis on hm.gridded.data)

Any opinion on the way to go?