pandas-dev / pandas

Flexible and powerful data analysis / manipulation library for Python, providing labeled data structures similar to R data.frame objects, statistical functions, and much more
https://pandas.pydata.org
BSD 3-Clause "New" or "Revised" License
43.88k stars 18.03k forks source link

BUG df with 'index' as one name of a MultiIndex fails to save as HDFStore table #6208

Open glyg opened 10 years ago

glyg commented 10 years ago

This one is strange... Here is a minimal example:

index = pd.MultiIndex(levels=[['foo', 'bar', 'baz', 'qux'],
                              ['one', 'two', 'three']],
                      codes=[[0, 0, 0, 1, 1, 2, 2, 3, 3, 3],
                              [0, 1, 2, 0, 1, 1, 2, 0, 1, 2]],
                      names=['index', 'bar_name'])

df_mi = pd.DataFrame(np.random.randn(10, 3), index=index,
                     columns=['A', 'B', 'C'])

with pd.HDFStore('minimal_io.h5', mode="w") as store:
    store.put('df_mi', df_mi, format='table')

And the error backtrace:

--------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-22-bb85f423e84c> in <module>()
     11 
     12 with pd.get_store('minimal_io.h5') as store:
---> 13     store.put('df_mi', df_mi, format='table')

/home/guillaume/python3/lib/python3.3/site-packages/pandas-0.13.0_496_ga2d5e53-py3.3-linux-x86_64.egg/pandas/io/pytables.py in put(self, key, value, format, append, **kwargs)
    819             format = get_option("io.hdf.default_format") or 'fixed'
    820         kwargs = self._validate_format(format, kwargs)
--> 821         self._write_to_group(key, value, append=append, **kwargs)
    822 
    823     def remove(self, key, where=None, start=None, stop=None):

/home/guillaume/python3/lib/python3.3/site-packages/pandas-0.13.0_496_ga2d5e53-py3.3-linux-x86_64.egg/pandas/io/pytables.py in _write_to_group(self, key, value, format, index, append, complib, encoding, **kwargs)
   1271 
   1272         # write the object
-> 1273         s.write(obj=value, append=append, complib=complib, **kwargs)
   1274 
   1275         if s.is_table and index:

/home/guillaume/python3/lib/python3.3/site-packages/pandas-0.13.0_496_ga2d5e53-py3.3-linux-x86_64.egg/pandas/io/pytables.py in write(self, obj, data_columns, **kwargs)
   3963         print(data_columns)
   3964         return super(AppendableMultiFrameTable, self).write(
-> 3965             obj=obj, data_columns=data_columns, **kwargs)
   3966 
   3967     def read(self, **kwargs):

/home/guillaume/python3/lib/python3.3/site-packages/pandas-0.13.0_496_ga2d5e53-py3.3-linux-x86_64.egg/pandas/io/pytables.py in write(self, obj, axes, append, complib, complevel, fletcher32, min_itemsize, chunksize, expectedrows, dropna, **kwargs)
   3603 
   3604         # add the rows
-> 3605         self.write_data(chunksize, dropna=dropna)
   3606 
   3607     def write_data(self, chunksize, dropna=True):

/home/guillaume/python3/lib/python3.3/site-packages/pandas-0.13.0_496_ga2d5e53-py3.3-linux-x86_64.egg/pandas/io/pytables.py in write_data(self, chunksize, dropna)
   3661         for i, v in enumerate(values):
   3662             new_shape = (nrows,) + self.dtype[names[nindexes + i]].shape
-> 3663             bvalues.append(values[i].ravel().reshape(new_shape))
   3664 
   3665         # write the chunks

ValueError: total size of new array must be unchanged

> /home/guillaume/python3/lib/python3.3/site-packages/pandas-0.13.0_496_ga2d5e53-py3.3-linux-x86_64.egg/pandas/io/pytables.py(3663)write_data()
   3662             new_shape = (nrows,) + self.dtype[names[nindexes + i]].shape
-> 3663             bvalues.append(values[i].ravel().reshape(new_shape))
   3664 

pandas version: '0.13.0-496-ga2d5e53'

Note that the bug is not there if the format is set to 'fixed', i.e. this works:

with pd.get_store('minimal_io.h5') as store:
    store.put('df_mi', df_mi, format='fixed')
jreback commented 10 years ago

not allowed to use 'index' as a level name in a MuliIndex when storing, because 'index' is 'reserved'. This should check for this and raise in validate_multindex and just raise a ValueError; too complicated to fix

phofl commented 1 year ago

Edit: correction, wrong example