pydata / xarray

N-D labeled arrays and datasets in Python
https://xarray.dev
Apache License 2.0
3.57k stars 1.07k forks source link

OpenDAP loaded Dataset has lon/lats with type 'object'. #39

Closed akleeman closed 10 years ago

akleeman commented 10 years ago
ds = xray.open_dataset('http://motherlode.ucar.edu/thredds/dodsC/grib/NCEP/GFS/Global_0p5deg/files/GFS_Global_0p5deg_20140303_0000.grib2', decode_cf=False)
In [4]: ds['lat'].dtype
Out[4]: dtype('O')

This makes serialization fail.

shoyer commented 10 years ago

This is because coordinates are loaded as pandas.Index objects... which don't always faithfully preserve the type of the underlying object (see https://github.com/pydata/pandas/issues/6471).

I believe serialization should still work though thanks to a work around I added for dtype=object. Do let me know if this is not the case. One solution to make this less awkward would be to wrap pandas.Index in something that keeps track of the dtype of the original arguments for use in mathematical expression.

ebrevdo commented 10 years ago

Indices also have an .inferred_type getter. unfortunately it doesn't seem to return true type names...

In [13]: pandas.Index([1,2,3]).inferred_type Out[13]: 'integer'

In [14]: pandas.Index([1,2,3.5]).inferred_type Out[14]: 'mixed-integer-float'

In [15]: pandas.Index(["ab","cd"]).inferred_type Out[15]: 'string'

In [16]: pandas.Index(["ab","cd",3]).inferred_type Out[16]: 'mixed-integer'

On Sun, Mar 2, 2014 at 10:14 PM, Stephan Hoyer notifications@github.comwrote:

This is because coordinates are loaded as pandas.Index objects... which don't always faithfully preserve the type of the underlying object (see pydata/pandas#6471 https://github.com/pydata/pandas/issues/6471).

I believe serialization should still work though thanks to a work around I added for dtype=object. Do let me know if this is not the case. One solution to make this less awkward would be to wrap pandas.Index in something that keeps track of the dtype of the original arguments for use in mathematical expression.

Reply to this email directly or view it on GitHubhttps://github.com/akleeman/xray/issues/39#issuecomment-36484122 .

akleeman commented 10 years ago

@shoyer You're right I can serialize the latitude object directly from that opendap url ... but after some manipulation I run into this:

ipdb> print fcst
dimensions:
    latitude = 31
    longitude = 46
    time = 7
variables:
    object latitude(latitude)
        units:degrees_north
        _CoordinateAxisType:Lat
    object longitude(longitude)
        units:degrees_east
        _CoordinateAxisType:Lon
    datet... time(time)
        standard_name:time
        _CoordinateAxisType:Time
        units:hours since 2014-03-03 00:0...
ipdb> fcst.dump('./test.nc')
*** TypeError: illegal primitive data type, must be one of ['i8', 'f4', 'u8', 'i1', 'U1', 'S1', 'i2', 'u1', 'i4', 'u2', 'f8', 'u4'], got object

Currently tracking down exactly whats going on here.

shoyer commented 10 years ago

I believe this was fixed by #54.