Closed tastyminerals closed 8 years ago
you have a corrupted file post a copy pastable example that fails
I'm having this same issue (after upgrade to latest version of pandas) and I've cleared out existing hdf files I have and allowed them all to be recreated.
I read data from gzipped xml file specially created for my script, it parses this file, finds words and creates a tensor (table with w1, link, w2 counts). The part which creates a tensor is posted above. When h5file.append
is executed, pandas produces this error. It was working around 2 months ago, I have changed neither the script, nor the file. Can you please tell me what does this error mean, so I could at least act accordingly?
Unfortunately I cannot post the complete code since it is a part of the research paper.
@tastyminerals it doesn't have to be your exact code, just an example that gives the same error.
pd.show_versions()
INSTALLED VERSIONS
------------------
commit: None
python: 3.5.1.final.0
python-bits: 64
OS: Linux
OS-release: 4.5.2-1-ARCH
machine: x86_64
processor:
byteorder: little
LC_ALL: None
LANG: C
pandas: 0.18.1
nose: None
pip: 8.1.1
setuptools: 21.0.0
Cython: None
numpy: 1.11.0
scipy: None
statsmodels: None
xarray: None
IPython: None
sphinx: None
patsy: None
dateutil: 2.5.3
pytz: 2016.4
blosc: None
bottleneck: None
tables: 3.2.2
numexpr: 2.5.2
matplotlib: None
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml: None
bs4: None
html5lib: None
httplib2: None
apiclient: None
sqlalchemy: 1.0.12
pymysql: None
psycopg2: None
jinja2: 2.8
boto: None
pandas_datareader: None
Code:
#!/usr/bin/python
import pandas as pd
def save_to(uuid, name, data):
path = uuid
key_name = name
data.to_hdf(path,
key=key_name)
pd.show_versions()
d = {'col1': 1, 'col2': 2, 'test': 'test'}
df = pd.DataFrame(data=d, index=['col1'])
save_to('abc', 'test', df)
Result:
Traceback (most recent call last):
File "example.py", line 15, in <module>
save_to('abc', 'test', df)
File "example.py", line 10, in save_to
key=key_name)
File "/usr/lib/python3.5/site-packages/pandas/core/generic.py", line 1101, in to_hdf
return pytables.to_hdf(path_or_buf, key, self, **kwargs)
File "/usr/lib/python3.5/site-packages/pandas/io/pytables.py", line 260, in to_hdf
f(store)
File "/usr/lib/python3.5/site-packages/pandas/io/pytables.py", line 255, in <lambda>
f = lambda store: store.put(key, value, **kwargs)
File "/usr/lib/python3.5/site-packages/pandas/io/pytables.py", line 826, in put
self._write_to_group(key, value, append=append, **kwargs)
File "/usr/lib/python3.5/site-packages/pandas/io/pytables.py", line 1264, in _write_to_group
s.write(obj=value, append=append, complib=complib, **kwargs)
File "/usr/lib/python3.5/site-packages/pandas/io/pytables.py", line 2799, in write
self.attrs.ndim = data.ndim
File "/usr/lib/python3.5/site-packages/tables/attributeset.py", line 461, in __setattr__
self._g__setattr(name, value)
File "/usr/lib/python3.5/site-packages/tables/attributeset.py", line 403, in _g__setattr
self._g_setattr(self._v_node, name, stvalue)
File "tables/hdf5extension.pyx", line 696, in tables.hdf5extension.AttributeSet._g_setattr (tables/hdf5extension.c:7549)
tables.exceptions.HDF5ExtError: HDF5 error back trace
File "H5A.c", line 634, in H5Awrite
not an attribute
End of HDF5 error back trace
Can't set attribute 'ndim' in node:
/test (Group) ''.
import pandas as pd
govdep_tup = [('addition-n', 'A1', 'say-v', 1), ('father-n', 'A0', 'settle-v', 1), ('couple-n', 'AM-TMP', 'stroll-v', 1), ('property-n', 'A2', 'include-v', 5), ('way-n', 'A1', 'consider-v', 1), ('people-n', 'A1', 'bury-v', 2), ('assume-v', 'A1', 'warn-v', 1), ('matter-n', 'A1', 'suppose-v', 1), ('picture-n', 'AM-LOC', 'stare-v', 1), ('be-v', 'A1', 'miss-v', 1)]
pdf = pd.DataFrame.from_records(govdep_tup)
pdf.columns = ['word0', 'link', 'word1', 'counts']
h5file = pd.HDFStore('tensor.h5', 'a', complevel=9, complib='blosc')
h5file.append("tensor", pdf,
data_columns=['word0', 'link', 'word1'],
nan_rep='_!NaN_',
min_itemsize={'word0': 55, 'link': 15, 'word1': 55})
h5file.close()
Thanks for the exampels. Nether of those raise errors for me. What version of HDF5 are you using? There was a recent release, but I don't think I've upgraded. I'm running 1.8.16.
I'm running 1.10.0-1 in arch - link
I definitely just got that update about the same time as the pandas update (within the last day or 2)
That explains the error, I am using arch repos running manjaro.
Mine is extra/hdf5 1.10.0-1 [installed]
Something is with the hdf version.
Well that's too bad. Looks like we'll have some work to do to get things compatible. I'm not able to upgrade at the moment, so if either of you are interested in poking around to see what changed 😄
This issue is probably worth keeping an eye on: https://github.com/PyTables/PyTables/issues/545
I should also say that any fix here will probably be in making pytables compatible with 1.10. But maybe we keep this issue open for now, until we can test the new version on travis?
At this point I wish I was going to be able to poke around (I will if I get the time) but I probably have to downgrade so I can push forward with what this blocks for me.
Thanks @TomAugspurger for the pytables link and helping to point out the problem - I'll keep an eye out for an upgrade (that works).
For the example I posted I had to downgrade to pytables 3.2.2-4 (from 3.2.2-5) and hdf to 1.8.15
reference: [0]https://wiki.archlinux.org/index.php/downgrading_packages [1]https://archive.archlinux.org/packages/p/python-pytables/ [2]https://archive.archlinux.org/packages/h/hdf5/
closing as this is not directly a pandas issue as indicated above.
not sure what broke but its ashame not better back-compat / testing.
Error
Started to get this error recently.
Code Sample, a copy-pastable example if possible
mydata is a giant tuple
(('word1', 'link', 'word2', int), (...),
Expected Output
tensor.h5 file
output of
pd.show_versions()
Corresponding stackoverflow issue.
http://stackoverflow.com/questions/37142173/hdf5exterror-when-creating-python-pandas-hdf-file