pandas-dev / pandas

Flexible and powerful data analysis / manipulation library for Python, providing labeled data structures similar to R data.frame objects, statistical functions, and much more
https://pandas.pydata.org
BSD 3-Clause "New" or "Revised" License
43.69k stars 17.92k forks source link

BUG: concat with copy=False of ExtensionArray fails #20756

Closed jorisvandenbossche closed 4 years ago

jorisvandenbossche commented 6 years ago
In [1]: from pandas.tests.extension.decimal.array import DecimalArray, make_data

In [5]: dec_arr = DecimalArray(make_data())

In [6]: df1 = pd.DataFrame({'int1': [1, 2, 3], 'key':[0, 1, 2], 'ext1': dec_arr[:3]})

In [7]: df2 = pd.DataFrame({'int2': [1, 2, 3, 4], 'key':[0, 0, 1, 3], 'ext2': dec_arr[3:7]})

In [8]: pd.concat([df1, df1], axis=1, copy=False)
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-60-bd0d3639db5b> in <module>()
----> 1 pd.concat([df1, df1], axis=1, copy=False)

/home/joris/scipy/pandas/pandas/core/reshape/concat.py in concat(objs, axis, join, join_axes, ignore_index, keys, levels, names, verify_integrity, copy)
    211                        verify_integrity=verify_integrity,
    212                        copy=copy)
--> 213     return op.get_result()
    214 
    215 

/home/joris/scipy/pandas/pandas/core/reshape/concat.py in get_result(self)
    406             new_data = concatenate_block_managers(
    407                 mgrs_indexers, self.new_axes, concat_axis=self.axis,
--> 408                 copy=self.copy)
    409             if not self.copy:
    410                 new_data._consolidate_inplace()

/home/joris/scipy/pandas/pandas/core/internals.py in concatenate_block_managers(mgrs_indexers, axes, concat_axis, copy)
   5394     import pdb; pdb.set_trace()
   5395 
-> 5396     for placement, join_units in concat_plan:
   5397 
   5398         if len(join_units) == 1 and not join_units[0].indexers:

AttributeError: 'DecimalArray' object has no attribute 'view'

this fails because of:

https://github.com/pandas-dev/pandas/blob/3a2e9e6c201fee07c3417550d2d47dca74066c3d/pandas/core/internals.py#L5396-L5403

so if copy=False, it takes a view of the data, which is not defined on the extension array interface. I am not fully sure why the view is needed here.

jorisvandenbossche commented 4 years ago

This is working now on master. It would still be good to add a test for this.