Rewording the errors on incorrect data format

kcpevey commented 6 years ago

When I feed polygons and points into an annotator to be included in the initial draw, I almost never format the data correctly. For example, polygons are something like a list of a list of tuples or a list of a list of lists. Anyway, its fine to require the data in a certain format, but the error message I get when I have incorrect formatting is not helpful for me. Its so deep inside holoviews I think it has something to do with the way I'm asking it to draw. It throws me off every time (and I've done this A LOT). I would humbly suggest some better feedback for improper formatting if its something that wouldn't be too challenging.

---------------------------------------------------------------------------
DataError                                 Traceback (most recent call last)
<ipython-input-26-865f1abbfbbc> in <module>()
      6 ann = PolyAndPointAnnotator(tile_url='https://server.arcgisonline.com/ArcGIS/rest/services/World_Imagery/MapServer/tile/{Z}/{Y}/{X}', 
      7                             extent=(125, 39.0, 127, 39.82),
----> 8                             crs=coord_sys, polys=[poly], points=[point])
      9 ann.tiles * ann.polys * ann.poly_view * ann.points

c:\projects\ers\github\earthsim_pyviz\newmaster\earthsim\earthsim\annotators.py in __init__(self, poly_data, **params)
    137 
    138     def __init__(self, poly_data={}, **params):
--> 139         super(PolyAnnotator, self).__init__(**params)
    140         self.poly_table_stream = CDSStream(data=poly_data, rename={'data': 'table_data'})
    141         self.poly_sel_stream = Selection1D(source=self.polys)

c:\projects\ers\github\earthsim_pyviz\newmaster\earthsim\earthsim\annotators.py in __init__(self, **params)
    206 
    207     def __init__(self, **params):
--> 208         super(PointAnnotator, self).__init__(**params)
    209         style = dict(editable=True)
    210         plot = dict(width=self.width, height=self.table_height)

c:\projects\ers\github\earthsim_pyviz\newmaster\earthsim\earthsim\annotators.py in __init__(self, polys, points, crs, **params)
     48         self.poly_stream = PolyDraw(source=self.polys, data={})
     49         self.vertex_stream = PolyEdit(source=self.polys)
---> 50         self.points = Points(points, self.polys.kdims, crs=crs)
     51         self.point_stream = PointDraw(source=self.points, data={})
     52 

C:\ProgramData\Anaconda3\envs\ers\lib\site-packages\geoviews-1.5.0a1.post8+g581151f.dirty-py3.5.egg\geoviews\element\geo.py in __init__(self, data, kdims, vdims, **kwargs)
     93         elif crs:
     94             kwargs['crs'] = crs
---> 95         super(_Element, self).__init__(data, kdims=kdims, vdims=vdims, **kwargs)
     96 
     97 

C:\ProgramData\Anaconda3\envs\ers\lib\site-packages\holoviews\core\data\__init__.py in __init__(self, data, kdims, vdims, **kwargs)
    195         validate_vdims = kwargs.pop('_validate_vdims', True)
    196         initialized = Interface.initialize(type(self), data, kdims, vdims,
--> 197                                            datatype=kwargs.get('datatype'))
    198         (data, self.interface, dims, extra_kws) = initialized
    199         super(Dataset, self).__init__(data, **dict(kwargs, **dict(dims, **extra_kws)))

C:\ProgramData\Anaconda3\envs\ers\lib\site-packages\holoviews\core\data\interface.py in initialize(cls, eltype, data, kdims, vdims, datatype)
    209                                   % (intfc.__name__, e))
    210                 error = ' '.join([error, priority_error])
--> 211             raise DataError(error)
    212 
    213         return data, interface, dims, extra_kws

DataError: None of the available storage backends were able to support the supplied data format.

philippjfr commented 6 years ago

I think this is definitely very important, we've already tried to improve these in certain cases but there's definitely a lot more we can do, particularly in regard to polygon data.

kcpevey commented 6 years ago

Also, somewhat related, I'll say that its fairly annoying to have to feed the poly/point data into the drawing in one format, and get back from the drawing that same data in another format (i.e. list of list of tuples vs ordered dict).

philippjfr commented 6 years ago

Also, somewhat related, I'll say that its fairly annoying to have to feed the poly/point data into the drawing in one format, and get back from the drawing that same data in another format (i.e. list of list of tuples vs ordered dict).

I do agree with this, however while we accept the data in the "list of lists of tuples format" that is not a valid internal representation of the data, so even before you use any stream your data will have been converted to a different format. Polygon data is currently stored in one of the following ways:

1) List of dictionaries for each path 2) List of arrays for each path 3) List of dataframes for each path 4) A geopandas dataframe

By default it will convert the lists of lists of tuples to 1, 2, or 3, something that is unlikely to change.

Separately though, the drawing tool streams' .data attribute currently returns data in the bokeh storage format which is different again and we should probably at least make that consistent.

jbednar commented 6 years ago

Could we avoid this confusion by modifying our examples to use one of 1, 2, or 3 when the data is first fed in?

kcpevey commented 6 years ago

@jbednar I think that would help. I'm not sure where I got the "list of lists of tuples" format, but I found that it worked and ran with it. It actually almost always requires me to reformat my data INTO that format so I'm happy to know I can stop doing that. Taking that input format option away and being forced into providing input in 1, 2, 3 or 4 might be more helpful in the long run rather than allowing for my ignorance on the front end :)

philippjfr commented 6 years ago

I'm not sure where I got the "list of lists of tuples" format, but I found that it worked and ran with it.

I think we use it in a few places since it's the simplest Python literal format.

It actually almost always requires me to reformat my data INTO that format so I'm happy to know I can stop doing that.

Would be good to know what format your data usually comes in and we could consider supporting that.

Taking that input format option away and being forced into providing input in 1, 2, 3 or 4

For backward compatibility reasons this is unlikely to happen.

jbednar commented 6 years ago

I don't think we need to prevent that format, just remove it from our examples if it's one that causes conceptual problems later, even if it seems convenient at the time.

kcpevey commented 6 years ago

Would be good to know what format your data usually comes in and we could consider supporting that.

My data comes in a variety of formats, generally np arrays and dataframes depending on the source. For example, I had been doing something like this with a simple two column csv file with x and y:

boundary_df = pd.read_csv('boundary.xy',delimiter='\t', header=None, names=['x','y'])
boundary_poly = [tuple(t) for t in zip(boundary_df.x.values, boundary_df.y.values)]
viz = PolyAndPointAnnotator(path_type=gv.Path, polys=[boundary_poly], 
                            crs=coord_sys,extent=(np.NaN,)*4, tile_url='http://tile.stamen.com/terrain/{Z}/{X}/{Y}.jpg')

Which was quite messy.

philippjfr commented 6 years ago

That is indeed quite messy, I would expect this to work:

boundary_df = pd.read_csv('boundary.xy',delimiter='\t', header=None, names=['x','y'])
viz = PolyAndPointAnnotator(path_type=gv.Path, polys=[boundary_df], 
                            crs=coord_sys,extent=(np.NaN,)*4, tile_url='http://tile.stamen.com/terrain/{Z}/{X}/{Y}.jpg')

kcpevey commented 6 years ago

::insert dramatic eye roll here:: That works beautifully. :)

philippjfr commented 6 years ago

Good to hear :-) On the inverse if trying to get the data back out from the stream, you should be able to fetch it in the same format using:

stream.element.split(datatype='dataframe')

kcpevey commented 6 years ago

That works great too! And it has the added benefit of consistent Latitude/Longitude labeling preventing this https://github.com/pyviz/EarthSim/issues/60

kcpevey commented 5 years ago

Is there any reason why the above suggestion:

boundary_df = pd.read_csv('boundary.xy',delimiter='\t', header=None, names=['x','y'])
viz = PolyAndPointAnnotator(path_type=gv.Path, polys=[boundary_df], 
                            crs=coord_sys,extent=(np.NaN,)*4, tile_url='http://tile.stamen.com/terrain/{Z}/{X}/{Y}.jpg')

Would not work with with points? e.g.:

viz = PolyAndPointAnnotator(path_type=gv.Path, polys=[boundary_df], points=[boundary_df],
                            crs=coord_sys,extent=(np.NaN,)*4, tile_url='http://tile.stamen.com/terrain/{Z}/{X}/{Y}.jpg')

My current example, which is essentially the same as above, gives this error on the point specification:

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-12-0a27f44fdd5e> in <module>()
      5                             poly_columns=[],
      6                             polys=[centerline_df],
----> 7                             points=[reservoir_df])
      8 

c:\projects\ers\github\earthsim\master\earthsim\earthsim\annotators.py in __init__(self, poly_data, **params)
    191 
    192     def __init__(self, poly_data={}, **params):
--> 193         super(PolyAnnotator, self).__init__(**params)
    194         style = dict(editable=True)
    195         plot = dict(width=self.width, height=self.table_height)

c:\projects\ers\github\earthsim\master\earthsim\earthsim\annotators.py in __init__(self, **params)
    235                 self.points = self.points.add_dimension(col, 0, None, True)
    236         self.point_stream = PointDraw(source=self.points, data={})
--> 237         projected = gv.project(self.points, projection=ccrs.PlateCarree())
    238         self.point_table = Table(projected).opts(plot=plot, style=style)
    239         self.point_link = PointTableLink(source=self.points, target=self.point_table)

C:\ProgramData\Anaconda3\envs\earthsimAUG\lib\site-packages\param\parameterized.py in __new__(class_, *args, **params)
   2047         inst = class_.instance()
   2048         inst.param._set_name(class_.__name__)
-> 2049         return inst.__call__(*args,**params)
   2050 
   2051     def __call__(self,*args,**kw):

c:\projects\ers\github\holoviews\master\holoviews\holoviews\core\operation.py in __call__(self, element, **params)
    161                                 operation=self, kwargs=params)
    162         elif isinstance(element, ViewableElement):
--> 163             processed = self._apply(element)
    164         elif isinstance(element, DynamicMap):
    165             if any((not d.values) for d in element.kdims):

c:\projects\ers\github\holoviews\master\holoviews\holoviews\core\operation.py in _apply(self, element, key)
    119         for hook in self._preprocess_hooks:
    120             kwargs.update(hook(self, element))
--> 121         ret = self._process(element, key)
    122         for hook in self._postprocess_hooks:
    123             ret = hook(self, ret, **kwargs)

C:\ProgramData\Anaconda3\envs\earthsimAUG\lib\site-packages\geoviews\operation\projection.py in _process(self, element, key)
    411         for op in self._operations:
    412             element = element.map(op.instance(projection=self.p.projection),
--> 413                                   op.supported_types)
    414         return element

c:\projects\ers\github\holoviews\master\holoviews\holoviews\core\dimension.py in map(self, map_fn, specs, clone)
    727             return deep_mapped
    728         else:
--> 729             return map_fn(self) if applies else self
    730 
    731 

c:\projects\ers\github\holoviews\master\holoviews\holoviews\core\operation.py in __call__(self, element, **params)
    161                                 operation=self, kwargs=params)
    162         elif isinstance(element, ViewableElement):
--> 163             processed = self._apply(element)
    164         elif isinstance(element, DynamicMap):
    165             if any((not d.values) for d in element.kdims):

c:\projects\ers\github\holoviews\master\holoviews\holoviews\core\operation.py in _apply(self, element, key)
    119         for hook in self._preprocess_hooks:
    120             kwargs.update(hook(self, element))
--> 121         ret = self._process(element, key)
    122         for hook in self._postprocess_hooks:
    123             ret = hook(self, ret, **kwargs)

C:\ProgramData\Anaconda3\envs\earthsimAUG\lib\site-packages\geoviews\operation\projection.py in _process(self, element, key)
     32 
     33     def _process(self, element, key=None):
---> 34         return element.map(self._process_element, self.supported_types)
     35 
     36 

c:\projects\ers\github\holoviews\master\holoviews\holoviews\core\dimension.py in map(self, map_fn, specs, clone)
    727             return deep_mapped
    728         else:
--> 729             return map_fn(self) if applies else self
    730 
    731 

C:\ProgramData\Anaconda3\envs\earthsimAUG\lib\site-packages\geoviews\operation\projection.py in _process_element(self, element)
    191         xdim, ydim = element.dimensions()[:2]
    192         xs, ys = (element.dimension_values(i) for i in range(2))
--> 193         coordinates = self.p.projection.transform_points(element.crs, xs, ys)
    194         mask = np.isfinite(coordinates[:, 0])
    195         new_data = {k: v[mask] for k, v in element.columns().items()}

lib\cartopy\_crs.pyx in cartopy._crs.CRS.transform_points()

ValueError: could not convert string to float: 'x'

kcpevey commented 5 years ago

My most recent comment above is related to Issue #133 .

I continue to run into DataError: None of the available storage backends were able to support the supplied data format. occasionally, but perhaps less so now that annotators can accept all element types. I'm closing this. If it comes back up, we can always reopen.

holoviz-topics / EarthSim

Rewording the errors on incorrect data format #61