HDF5ExtError when creating python pandas HDF file

tastyminerals commented 8 years ago

Error

Started to get this error recently.

tables.exceptions.HDF5ExtError: HDF5 error back trace

  File "H5A.c", line 634, in H5Awrite
    not an attribute

End of HDF5 error back trace

Can't set attribute 'levels' in node:
 /tensor (Group) ''.
Closing remaining open files:tensor.h5...done

Code Sample, a copy-pastable example if possible

mydata is a giant tuple (('word1', 'link', 'word2', int), (...),

pdf = pd.DataFrame.from_records(mydata)
pdf.columns = ['word0', 'link', 'word1', 'counts']
h5file = pd.HDFStore(h5fname, 'a', complevel=9, complib='blosc')
h5file.append("tensor", pdf,
              data_columns=['word0', 'link', 'word1'],
              nan_rep='_!NaN_',
              min_itemsize={'word0': 55, 'link': 15, 'word1': 55})
h5file.close()

Expected Output

tensor.h5 file

output of `pd.show_versions()`

INSTALLED VERSIONS
------------------
commit: None
python: 2.7.11.final.0
python-bits: 64
OS: Linux
OS-release: 4.3.6-1-MANJARO
machine: x86_64
processor: 
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8

pandas: 0.18.0
nose: 1.3.7
pip: 8.1.1
setuptools: 20.2.2
Cython: 0.24
numpy: 1.11.0
scipy: 0.17.0
statsmodels: None
xarray: None
IPython: 4.2.0
sphinx: 1.3.6
patsy: None
dateutil: 2.5.3
pytz: 2016.4
blosc: None
bottleneck: 1.0.0
tables: 3.2.2
numexpr: 2.5.2
matplotlib: 1.5.1
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml: None
bs4: 4.4.1
html5lib: 0.9999999
httplib2: 0.9.2
apiclient: None
sqlalchemy: 1.0.12
pymysql: None
psycopg2: None
jinja2: 2.8
boto: None

Corresponding stackoverflow issue.

http://stackoverflow.com/questions/37142173/hdf5exterror-when-creating-python-pandas-hdf-file

jreback commented 8 years ago

you have a corrupted file post a copy pastable example that fails

seanenck commented 8 years ago

I'm having this same issue (after upgrade to latest version of pandas) and I've cleared out existing hdf files I have and allowed them all to be recreated.

tastyminerals commented 8 years ago

I read data from gzipped xml file specially created for my script, it parses this file, finds words and creates a tensor (table with w1, link, w2 counts). The part which creates a tensor is posted above. When h5file.append is executed, pandas produces this error. It was working around 2 months ago, I have changed neither the script, nor the file. Can you please tell me what does this error mean, so I could at least act accordingly?

Unfortunately I cannot post the complete code since it is a part of the research paper.

TomAugspurger commented 8 years ago

@tastyminerals it doesn't have to be your exact code, just an example that gives the same error.

seanenck commented 8 years ago

pd.show_versions()

INSTALLED VERSIONS
------------------
commit: None
python: 3.5.1.final.0
python-bits: 64
OS: Linux
OS-release: 4.5.2-1-ARCH
machine: x86_64
processor: 
byteorder: little
LC_ALL: None
LANG: C

pandas: 0.18.1
nose: None
pip: 8.1.1
setuptools: 21.0.0
Cython: None
numpy: 1.11.0
scipy: None
statsmodels: None
xarray: None
IPython: None
sphinx: None
patsy: None
dateutil: 2.5.3
pytz: 2016.4
blosc: None
bottleneck: None
tables: 3.2.2
numexpr: 2.5.2
matplotlib: None
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml: None
bs4: None
html5lib: None
httplib2: None
apiclient: None
sqlalchemy: 1.0.12
pymysql: None
psycopg2: None
jinja2: 2.8
boto: None
pandas_datareader: None

Code:

#!/usr/bin/python

import pandas as pd

def save_to(uuid, name, data):
    path = uuid
    key_name = name
    data.to_hdf(path,
                key=key_name)

pd.show_versions()
d = {'col1': 1, 'col2': 2, 'test': 'test'}
df = pd.DataFrame(data=d, index=['col1'])
save_to('abc', 'test', df)

Result:

Traceback (most recent call last):
  File "example.py", line 15, in <module>
    save_to('abc', 'test', df)
  File "example.py", line 10, in save_to
    key=key_name)
  File "/usr/lib/python3.5/site-packages/pandas/core/generic.py", line 1101, in to_hdf
    return pytables.to_hdf(path_or_buf, key, self, **kwargs)
  File "/usr/lib/python3.5/site-packages/pandas/io/pytables.py", line 260, in to_hdf
    f(store)
  File "/usr/lib/python3.5/site-packages/pandas/io/pytables.py", line 255, in <lambda>
    f = lambda store: store.put(key, value, **kwargs)
  File "/usr/lib/python3.5/site-packages/pandas/io/pytables.py", line 826, in put
    self._write_to_group(key, value, append=append, **kwargs)
  File "/usr/lib/python3.5/site-packages/pandas/io/pytables.py", line 1264, in _write_to_group
    s.write(obj=value, append=append, complib=complib, **kwargs)
  File "/usr/lib/python3.5/site-packages/pandas/io/pytables.py", line 2799, in write
    self.attrs.ndim = data.ndim
  File "/usr/lib/python3.5/site-packages/tables/attributeset.py", line 461, in __setattr__
    self._g__setattr(name, value)
  File "/usr/lib/python3.5/site-packages/tables/attributeset.py", line 403, in _g__setattr
    self._g_setattr(self._v_node, name, stvalue)
  File "tables/hdf5extension.pyx", line 696, in tables.hdf5extension.AttributeSet._g_setattr (tables/hdf5extension.c:7549)
tables.exceptions.HDF5ExtError: HDF5 error back trace

  File "H5A.c", line 634, in H5Awrite
    not an attribute

End of HDF5 error back trace

Can't set attribute 'ndim' in node:
 /test (Group) ''.

tastyminerals commented 8 years ago

Copy-pastable example

import pandas as pd

govdep_tup = [('addition-n', 'A1', 'say-v', 1), ('father-n', 'A0', 'settle-v', 1), ('couple-n', 'AM-TMP', 'stroll-v', 1), ('property-n', 'A2', 'include-v', 5), ('way-n', 'A1', 'consider-v', 1), ('people-n', 'A1', 'bury-v', 2), ('assume-v', 'A1', 'warn-v', 1), ('matter-n', 'A1', 'suppose-v', 1), ('picture-n', 'AM-LOC', 'stare-v', 1), ('be-v', 'A1', 'miss-v', 1)]

pdf = pd.DataFrame.from_records(govdep_tup)
pdf.columns = ['word0', 'link', 'word1', 'counts']
h5file = pd.HDFStore('tensor.h5', 'a', complevel=9, complib='blosc')
h5file.append("tensor", pdf,
          data_columns=['word0', 'link', 'word1'],
          nan_rep='_!NaN_',
          min_itemsize={'word0': 55, 'link': 15, 'word1': 55})
h5file.close()

TomAugspurger commented 8 years ago

Thanks for the exampels. Nether of those raise errors for me. What version of HDF5 are you using? There was a recent release, but I don't think I've upgraded. I'm running 1.8.16.

seanenck commented 8 years ago

I'm running 1.10.0-1 in arch - link

I definitely just got that update about the same time as the pandas update (within the last day or 2)

tastyminerals commented 8 years ago

That explains the error, I am using arch repos running manjaro. Mine is extra/hdf5 1.10.0-1 [installed] Something is with the hdf version.

TomAugspurger commented 8 years ago

Well that's too bad. Looks like we'll have some work to do to get things compatible. I'm not able to upgrade at the moment, so if either of you are interested in poking around to see what changed 😄

This issue is probably worth keeping an eye on: https://github.com/PyTables/PyTables/issues/545

TomAugspurger commented 8 years ago

I should also say that any fix here will probably be in making pytables compatible with 1.10. But maybe we keep this issue open for now, until we can test the new version on travis?

seanenck commented 8 years ago

At this point I wish I was going to be able to poke around (I will if I get the time) but I probably have to downgrade so I can push forward with what this blocks for me.

Thanks @TomAugspurger for the pytables link and helping to point out the problem - I'll keep an eye out for an upgrade (that works).

seanenck commented 8 years ago

For the example I posted I had to downgrade to pytables 3.2.2-4 (from 3.2.2-5) and hdf to 1.8.15

reference: [0]https://wiki.archlinux.org/index.php/downgrading_packages [1]https://archive.archlinux.org/packages/p/python-pytables/ [2]https://archive.archlinux.org/packages/h/hdf5/

jreback commented 8 years ago

closing as this is not directly a pandas issue as indicated above.

not sure what broke but its ashame not better back-compat / testing.

pandas-dev / pandas