intake / intake-stac

Intake interface to STAC data catalogs
https://intake-stac.readthedocs.io/en/latest/
BSD 2-Clause "Simplified" License
110 stars 26 forks source link

ValueError: Can't clean for JSON for intake.catalog.local.LocalCatalogEntry #31

Open scottyhq opened 4 years ago

scottyhq commented 4 years ago

Running into an error outputting an intake.catalog.local.LocalCatalogEntry in a jupyter notebook. print(entry) works, but display(entry) a ValueError: Can't clean for JSON

pinging @jhamman and @martindurant for help sorting this one out. I think it's likely a simple fix.

import intake 
import intake_stac
print(intake.__version__) #0.5.3
print(intake_stac.__version__) #0.2.1

cat = open_stac_catalog('https://storage.googleapis.com/pdd-stac/disasters/catalog.json')
list(cat)
entry = cat['Houston-East-20170831-103f-100d-0f4f-RGB']
type(entry) #intake.catalog.local.LocalCatalogEntry
print(entry)
"""
name: Houston-East-20170831-103f-100d-0f4f-RGB
container: catalog
plugin: ['stac_item']
description: 
direct_access: True
user_parameters: []
metadata: 
args: 
  stac_obj: Houston-East-20170831-103f-100d-0f4f-RGB
"""
display(entry)
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
/srv/conda/envs/notebook/lib/python3.7/site-packages/IPython/core/formatters.py in __call__(self, obj)
    916             method = get_real_method(obj, self.print_method)
    917             if method is not None:
--> 918                 method()
    919                 return True
    920 

/srv/conda/envs/notebook/lib/python3.7/site-packages/intake/catalog/entry.py in _ipython_display_(self)
    113         }, metadata={
    114             'application/json': {'root': contents["name"]}
--> 115         }, raw=True)
    116 
    117     def __getattr__(self, attr):

/srv/conda/envs/notebook/lib/python3.7/site-packages/IPython/core/display.py in display(include, exclude, metadata, transient, display_id, *objs, **kwargs)
    309     for obj in objs:
    310         if raw:
--> 311             publish_display_data(data=obj, metadata=metadata, **kwargs)
    312         else:
    313             format_dict, md_dict = format(obj, include=include, exclude=exclude)

/srv/conda/envs/notebook/lib/python3.7/site-packages/IPython/core/display.py in publish_display_data(data, metadata, source, transient, **kwargs)
    120         data=data,
    121         metadata=metadata,
--> 122         **kwargs
    123     )
    124 

/srv/conda/envs/notebook/lib/python3.7/site-packages/ipykernel/zmqshell.py in publish(self, data, metadata, source, transient, update)
    127         # hooks before potentially sending.
    128         msg = self.session.msg(
--> 129             msg_type, json_clean(content),
    130             parent=self.parent_header
    131         )

/srv/conda/envs/notebook/lib/python3.7/site-packages/ipykernel/jsonutil.py in json_clean(obj)
    189         out = {}
    190         for k,v in iteritems(obj):
--> 191             out[unicode_type(k)] = json_clean(v)
    192         return out
    193     if isinstance(obj, datetime):

/srv/conda/envs/notebook/lib/python3.7/site-packages/ipykernel/jsonutil.py in json_clean(obj)
    189         out = {}
    190         for k,v in iteritems(obj):
--> 191             out[unicode_type(k)] = json_clean(v)
    192         return out
    193     if isinstance(obj, datetime):

/srv/conda/envs/notebook/lib/python3.7/site-packages/ipykernel/jsonutil.py in json_clean(obj)
    189         out = {}
    190         for k,v in iteritems(obj):
--> 191             out[unicode_type(k)] = json_clean(v)
    192         return out
    193     if isinstance(obj, datetime):

/srv/conda/envs/notebook/lib/python3.7/site-packages/ipykernel/jsonutil.py in json_clean(obj)
    189         out = {}
    190         for k,v in iteritems(obj):
--> 191             out[unicode_type(k)] = json_clean(v)
    192         return out
    193     if isinstance(obj, datetime):

/srv/conda/envs/notebook/lib/python3.7/site-packages/ipykernel/jsonutil.py in json_clean(obj)
    195 
    196     # we don't understand it, it's probably an unserializable object
--> 197     raise ValueError("Can't clean for JSON: %r" % obj)

ValueError: Can't clean for JSON: Houston-East-20170831-103f-100d-0f4f-RGB
jhamman commented 4 years ago

We inherit the _ipython_display_ method from intake's CatalogEntry. My guess is that there is some bits of metadata on the stac object that are not parsable by ipython's json parser. We should be able to overwrite this behavior (or fix this upstream with sat-stac).

https://github.com/intake/intake/blob/a4d216d1378fc8eaedc6796c1516317316ec6a8e/intake/catalog/entry.py#L106-L115

scottyhq commented 4 years ago

So, the error is due to datetime objects in the metadata:

Adding this to the above code results in a more informative error

import json
json.dumps(entry.metadata)
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-1-d97249792b93> in <module>
     13 import json
     14 #json.dumps(entry)
---> 15 json.dumps(entry.metadata)

/srv/conda/envs/notebook/lib/python3.7/json/__init__.py in dumps(obj, skipkeys, ensure_ascii, check_circular, allow_nan, cls, indent, separators, default, sort_keys, **kw)
    229         cls is None and indent is None and separators is None and
    230         default is None and not sort_keys and not kw):
--> 231         return _default_encoder.encode(obj)
    232     if cls is None:
    233         cls = JSONEncoder

/srv/conda/envs/notebook/lib/python3.7/json/encoder.py in encode(self, o)
    197         # exceptions aren't as detailed.  The list call should be roughly
    198         # equivalent to the PySequence_Fast that ''.join() would do.
--> 199         chunks = self.iterencode(o, _one_shot=True)
    200         if not isinstance(chunks, (list, tuple)):
    201             chunks = list(chunks)

/srv/conda/envs/notebook/lib/python3.7/json/encoder.py in iterencode(self, o, _one_shot)
    255                 self.key_separator, self.item_separator, self.sort_keys,
    256                 self.skipkeys, _one_shot)
--> 257         return _iterencode(o, 0)
    258 
    259 def _make_iterencode(markers, _default, _encoder, _indent, _floatstr,

/srv/conda/envs/notebook/lib/python3.7/json/encoder.py in default(self, o)
    177 
    178         """
--> 179         raise TypeError(f'Object of type {o.__class__.__name__} '
    180                         f'is not JSON serializable')
    181 

TypeError: Object of type datetime is not JSON serializable

The metadata looks like this:

{'datetime': datetime.datetime(2017, 8, 31, 17, 24, 57, 555491, tzinfo=tzlocal()),
 'provider': 'Planet',
 'license': 'CC-BY-SA',
 'eo:cloud_cover': 2,
 'eo:gsd': 3.7,
 'eo:sun_azimuth': 145.5,
 'eo:sun_elevation': 64.9,
 'eo:view_angle': 0.2,
 'pl:epsg_code': 32615,
 'pl:ground_control': True,
 'pl:instrument': 'PS2',
 'pl:provider': 'planetscope',
 'bbox': [-95.73737276800716,
  29.561332400220497,
  -95.05332428370095,
  30.157560439570304],
 'geometry': {'type': 'Polygon',
  'coordinates': [[[-95.73737276800716, 30.14525788823348],
    [-95.06532619920118, 30.157560439570304],
    [-95.05332428370095, 29.57334931237589],
    [-95.7214758280382, 29.561332400220497],
    [-95.73737276800716, 30.14525788823348]]]},
 'date': datetime.date(2017, 8, 31),
 'catalog_dir': ''}

The following can fix the issue, but I'm confused as to where this should go in the codebase:

import datetime
def convert_datetime(o):
    if isinstance(o, datetime.datetime) or isinstance(o, datetime.date):
        return o.__str__()
    else:
        return o

md = entry.metadata
clean = {k: convert_datetime(v) for k, v in md.items()}

Maybe @ian-r-rose has a suggestion based on this intake pull request https://github.com/intake/intake/pull/327

martindurant commented 4 years ago

At what point is JSON encoding required? I'm pretty sure that YAML has no problem with this.

ian-r-rose commented 4 years ago

Interesting, I hadn't considered that there would be datetime objects in the metadata.

@martindurant when I added the custom __repr__ I used JSON for the mimetype, with the knowledge that it was a widely-supported one by a variety of frontends (JupyterLab, nteract, etc).

Is there some fuller accounting of the non-JSON-able types that might pop up in the metadata? If not, the basic workaround that @scottyhq points to seems reasonable to me, if a bit fragile. We could provide a default serializer function str() to the JSON-serialization to try to cover all possible objects that might be in the metadata.

martindurant commented 4 years ago

msgpack and yaml are the serialisers of reference in intake; but no, there is no list of expected metadata contents, and some drivers might choose to store more complex things if they are not to be serialised at all.

scottyhq commented 4 years ago

Thanks for the input. As far as I can tell we are using satstac to read JSON metadata that starts as strings ("datetime": "2019-10-31T19:02:13.439292+00:00"), but gets converted to datetime objects - @matthewhanson can confirm: https://github.com/pangeo-data/intake-stac/blob/d2c3f01b2e9931da7b1d87aaa67c7e0c107c5fc7/intake_stac/catalog.py#L207

So another short-term solution is just to not convert the strings for intake metadata.