pandas-dev / pandas

Flexible and powerful data analysis / manipulation library for Python, providing labeled data structures similar to R data.frame objects, statistical functions, and much more
https://pandas.pydata.org
BSD 3-Clause "New" or "Revised" License
43.61k stars 17.9k forks source link

json round trip exception #3867

Closed hayd closed 11 years ago

hayd commented 11 years ago

This csv (from the baseball database) reads ok to a DataFrame, pastes ok to a json.

In [6]: df = pd.read_csv('https://raw.github.com/hayd/lahman2012/master/csvs/Teams.csv')

In [7]: s = df.to_json()

In [8]: pd.read_json(s)
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-8-ebde42cd0695> in <module>()
----> 1 pd.read_json(s)

/Users/234BroadWalk/pandas/pandas/io/json.pyc in read_json(path_or_buf, orient, typ, dtype, numpy, parse_dates, keep_default_dates)
    158     obj = None
    159     if typ == 'frame':
--> 160         obj = FrameParser(json, orient, dtype, numpy, parse_dates, keep_default_dates).parse()
    161
    162     if typ == 'series' or obj is None:

/Users/234BroadWalk/pandas/pandas/io/json.pyc in parse(self)
    185
    186     def parse(self):
--> 187         self._parse()
    188         if self.obj is not None:
    189             self._convert_axes()

/Users/234BroadWalk/pandas/pandas/io/json.pyc in _parse(self)
    284             try:
    285                 if orient == "columns":
--> 286                     args = loads(json, dtype=dtype, numpy=True, labelled=True)
    287                     if args:
    288                         args = (args[0].T, args[2], args[1])

TypeError: long() argument must be a string or a number, not 'NoneType'

cc #3804

jreback commented 11 years ago

was a bug, but ran into another feature/bug

here's my new test:

df = pd.read_csv('https://raw.github.com/hayd/lahman2012/master/csvs/Teams.csv')
s = df.to_json()
result = pd.read_json(s)
result.index = result.index.astype(int)
result = result.reindex(columns=df.columns,index=df.index)
assert_frame_equal(result,df)

so, I am not sure json guarantees order? and should I try to do automatic index conversion on other types (I am doing it on datetimes now)?

hayd commented 11 years ago

Guess it's not so surprising, python dictionaries don't... (I don't think?). Quite a big file to test against!

Not sure, what were you thinking?

jreback commented 11 years ago

I think @cpcloud had sort of the same problem in html, he added infer_types kw....now I am doing that for dates now; I mean its not hard to do a soft conversion, e.g. no forcing......

cpcloud commented 11 years ago

do all valid json objects have a total ordering in python? if they do why not guarantee ordering, unless of course that goes against json spec...

python dicts don't because there are hashable objects that don't define an ordering eg complex numbers, custom objects, among other erasons

hayd commented 11 years ago

Hmmm, different bug?

In [5]: pd.read_json('[{"a": 1, "b": 2}, {"b":2, "a" :1}]')
Out[5]:
   0  1
a  1  2
b  2  1
jreback commented 11 years ago

which one is more useful to round-trip exactly?

biggie = DataFrame(np.zeros((200, 4)),
                           columns=[str(i) for i in range(4)],
                           index=[str(i) for i in range(200)])
biggie2 = DataFrame(np.zeros((200,4)),
                           columns=range(4),
                           index=range(200))
jreback commented 11 years ago

@cpcloud any thoughts?

cpcloud commented 11 years ago

roundtrip doesn't look like it can be invertible...they both json'd the same because of json's rules about keys in objects (must be string).

jreback commented 11 years ago

I am going to setup some options so the second will roundtrip hence convert_axes=True

while the 1st will work if you pass convert_axes=False

cpcloud commented 11 years ago

this might present a problem for nested json, no? that's a different beast though so for "frame/series-able" json that's probably ok

jreback commented 11 years ago

conversion is done at the end so should worl

jreback commented 11 years ago

fixed by #3876

jreback commented 11 years ago

closing this as incorporated in #3876