pandas-dev / pandas

Flexible and powerful data analysis / manipulation library for Python, providing labeled data structures similar to R data.frame objects, statistical functions, and much more
https://pandas.pydata.org
BSD 3-Clause "New" or "Revised" License
43.61k stars 17.9k forks source link

ERR: concat of non-unique join axes should have better error #13084

Closed gregsifr closed 4 years ago

gregsifr commented 8 years ago

You get a confusing error message when trying to concat on non-unique (but also non-exactly-equal) indices. Small example:

In [57]: df1 = pd.DataFrame({'col1': [1, 2, 3]}, index=[0, 0, 1])
    ...: df2 = pd.DataFrame({'col2': [1, 2, 3]}, index=[0, 1, 2])

In [59]: pd.concat([df1, df2], axis=1)
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-59-756087e4d415> in <module>()
----> 1 pd.concat([df1, df2], axis=1)

/home/joris/scipy/pandas/pandas/tools/concat.py in concat(objs, axis, join, join_axes, ignore_index, keys, levels, names, verify_integrity, copy)
    205                        verify_integrity=verify_integrity,
    206                        copy=copy)
--> 207     return op.get_result()
    208 
    209 

/home/joris/scipy/pandas/pandas/tools/concat.py in get_result(self)
    405             new_data = concatenate_block_managers(
    406                 mgrs_indexers, self.new_axes, concat_axis=self.axis,
--> 407                 copy=self.copy)
    408             if not self.copy:
    409                 new_data._consolidate_inplace()

/home/joris/scipy/pandas/pandas/core/internals.py in concatenate_block_managers(mgrs_indexers, axes, concat_axis, copy)
   4849         placement=placement) for placement, join_units in concat_plan]
   4850 
-> 4851     return BlockManager(blocks, axes)
   4852 
   4853 

/home/joris/scipy/pandas/pandas/core/internals.py in __init__(self, blocks, axes, do_integrity_check, fastpath)
   2784 
   2785         if do_integrity_check:
-> 2786             self._verify_integrity()
   2787 
   2788         self._consolidate_check()

/home/joris/scipy/pandas/pandas/core/internals.py in _verify_integrity(self)
   2994         for block in self.blocks:
   2995             if block._verify_integrity and block.shape[1:] != mgr_shape[1:]:
-> 2996                 construction_error(tot_items, block.shape[1:], self.axes)
   2997         if len(self.items) != tot_items:
   2998             raise AssertionError('Number of manager items must equal union of '

/home/joris/scipy/pandas/pandas/core/internals.py in construction_error(tot_items, block_shape, axes, e)
   4258         raise ValueError("Empty data passed with indices specified.")
   4259     raise ValueError("Shape of passed values is {0}, indices imply {1}".format(
-> 4260         passed, implied))
   4261 
   4262 

ValueError: Shape of passed values is (2, 6), indices imply (2, 4)

Original reported issue by @gregsifr :

I am working with a large dataframe of customers which I was unable to concat. After spending some time I narrowed the problem area down to the below (pickled) dataframes and code. When trying to concat the dataframes using the following code the error shown below is returned: ``` import pandas as pd import pickle df1 = pickle.loads('ccopy_reg\n_reconstructor\np1\n(cpandas.core.frame\nDataFrame\np2\nc__builtin__\nobject\np3\nNtRp4\n(dp5\nS\'_metadata\'\np6\n(lp7\nsS\'_typ\'\np8\nS\'dataframe\'\np9\nsS\'_data\'\np10\ng1\n(cpandas.core.internals\nBlockManager\np11\ng3\nNtRp12\n((lp13\ncpandas.core.index\n_new_Index\np14\n(cpandas.core.index\nMultiIndex\np15\n(dp16\nS\'labels\'\np17\n(lp18\ncnumpy.core.multiarray\n_reconstruct\np19\n(cpandas.core.base\nFrozenNDArray\np20\n(I0\ntS\'b\'\ntRp21\n(I1\n(L2L\ntcnumpy\ndtype\np22\n(S\'i1\'\nI0\nI1\ntRp23\n(I3\nS\'|\'\nNNNI-1\nI-1\nI0\ntbI00\nS\'\\x00\\x00\'\ntbag19\n(g20\n(I0\ntS\'b\'\ntRp24\n(I1\n(L2L\ntg23\nI00\nS\'\\x00\\x01\'\ntbasS\'names\'\np25\n(lp26\nNaNasS\'levels\'\np27\n(lp28\ng14\n(cpandas.core.index\nIndex\np29\n(dp30\nS\'data\'\np31\ng19\n(cnumpy\nndarray\np32\n(I0\ntS\'b\'\ntRp33\n(I1\n(L1L\ntg22\n(S\'O8\'\nI0\nI1\ntRp34\n(I3\nS\'|\'\nNNNI-1\nI-1\nI63\ntbI00\n(lp35\nVCUSTOMER_A\np36\natbsS\'name\'\np37\nNstRp38\nag14\n(g29\n(dp39\ng31\ng19\n(g32\n(I0\ntS\'b\'\ntRp40\n(I1\n(L2L\ntg34\nI00\n(lp41\nVVISIT_DT\np42\naVPURCHASE\np43\natbsg37\nNstRp44\nasS\'sortorder\'\np45\nNstRp46\nacpandas.tseries.index\n_new_DatetimeIndex\np47\n(cpandas.tseries.index\nDatetimeIndex\np48\n(dp49\nS\'tz\'\np50\nNsS\'freq\'\np51\nNsg31\ng19\n(g32\n(I0\ntS\'b\'\ntRp52\n(I1\n(L22L\ntg22\n(S\'M8\'\nI0\nI1\ntRp53\n(I4\nS\'<\'\nNNNI-1\nI-1\nI0\n((d(S\'ns\'\nI1\nI1\nI1\ntttbI00\nS\'\\x00\\x00\\x1c\\xca\\xf9\\xceO\\x10\\x00\\x00k[\\x8e\\x1dP\\x10\\x00\\x00\\xba\\xec"lP\\x10\\x00\\x00\\t~\\xb7\\xbaP\\x10\\x00\\x00X\\x0fL\\tQ\\x10\\x00\\x00\\xa7\\xa0\\xe0WQ\\x10\\x00\\x00\\x94T\\x9eCR\\x10\\x00\\x00\\xe3\\xe52\\x92R\\x10\\x00\\x002w\\xc7\\xe0R\\x10\\x00\\x00\\x81\\x08\\\\/S\\x10\\x00\\x00\\xd0\\x99\\xf0}S\\x10\\x00\\x00\\xbdM\\xaeiT\\x10\\x00\\x00\\x0c\\xdfB\\xb8T\\x10\\x00\\x00[p\\xd7\\x06U\\x10\\x00\\x00\\xaa\\x01lUU\\x10\\x00\\x00\\xf9\\x92\\x00\\xa4U\\x10\\x00\\x00\\xe6F\\xbe\\x8fV\\x10\\x00\\x005\\xd8R\\xdeV\\x10\\x00\\x00\\x84i\\xe7,W\\x10\\x00\\x00\\xd3\\xfa{{W\\x10\\x00\\x00"\\x8c\\x10\\xcaW\\x10\\x00\\x00\\x0f@\\xce\\xb5X\\x10\'\ntbsg37\nS\'date\'\np54\nstRp55\na(lp56\ng19\n(g32\n(I0\ntS\'b\'\ntRp57\n(I1\n(L2L\nL22L\ntg22\n(S\'f8\'\nI0\nI1\ntRp58\n(I3\nS\'<\'\nNNNI-1\nI-1\nI0\ntbI00\nS\'\\x00\\x00\\x00\\x00\\x00\\x00\\xf8\\x7f\\x00\\x00\\x00\\x00\\x00\\x00\\xf8\\x7f\\x00\\x00\\x00\\x00\\x00\\x00\\xf8\\x7f\\x00\\x00\\x00\\x00\\x00\\x00\\xf8\\x7f\\x00\\x00\\x00\\x00\\x00\\x00\\xf8\\x7f\\x00\\x00\\x00\\x00\\x00\\x00\\xf8\\x7f\\x00\\x00\\x00\\x00\\x00\\x00\\xf8\\x7f\\x00\\x00\\x00\\x00\\x00\\x00\\xf8\\x7f\\x00\\x00\\x00\\x00\\x00\\x00\\xf8\\x7f\\x00\\x00\\x00\\x00\\x00\\x00\\xf8\\x7f\\x00\\x00\\x00\\x00\\x00\\x00\\xf8\\x7f\\x00\\x00\\x00\\x00\\x00\\x00\\xf8\\x7f\\x00\\x00\\x00\\x00\\x00\\x00\\xf8\\x7f\\x00\\x00\\x00\\x00\\x00\\x00\\xf8\\x7f\\x00\\x00\\x00\\x00\\x00\\x00\\xf8\\x7f\\x00\\x00\\x00\\x00\\x00\\x00\\xf8\\x7f\\x00\\x00\\x00\\x00\\x00\\x00\\xf8\\x7f\\x00\\x00\\x00\\x00\\x00\\x00\\xf8\\x7f\\x00\\x00\\x00\\x00\\x00\\x00\\xf8\\x7f\\x00\\x00\\x00\\x00\\x00\\x00\\xf8\\x7f\\x00\\x00\\x00\\x00\\x00\\x00\\xf8\\x7f\\x00\\x00\\x00\\x00\\x00\\x00\\xf8\\x7f\\x00\\x00\\x00\\x00\\x00\\x00\\xf8\\x7f\\x00\\x00\\x00\\x00\\x00\\x00\\xf8\\x7f\\x00\\x00\\x00\\x00\\x00\\x00\\xf8\\x7f\\x00\\x00\\x00\\x00\\x00\\x00\\xf8\\x7f\\x00\\x00\\x00\\x00\\x00\\x00\\xf8\\x7f\\x00\\x00\\x00\\x00\\x00\\x00\\xf8\\x7f\\x00\\x00\\x00\\x00\\x00\\x00\\xf8\\x7f\\x00\\x00\\x00\\x00\\x00\\x00\\xf8\\x7f\\x00\\x00\\x00\\x00\\x00\\x00\\xf8\\x7f\\x00\\x00\\x00\\x00\\x00\\x00\\xf8\\x7f\\x00\\x00\\x00\\x00\\x00\\x00\\xf8\\x7f\\x00\\x00\\x00\\x00\\x00\\x00\\xf8\\x7f\\x00\\x00\\x00\\x00\\x00\\x00\\xf8\\x7f\\x00\\x00\\x00\\x00\\x00\\x00\\xf8\\x7f\\x00\\x00\\x00\\x00\\x00\\x00\\xf8\\x7f\\x00\\x00\\x00\\x00\\x00\\x00\\xf8\\x7f\\x00\\x00\\x00\\x00\\x00\\x00\\xf8\\x7f\\x00\\x00\\x00\\x00\\x00\\x00\\xf8\\x7f\\x00\\x00\\x00\\x00\\x00\\x00\\xf8\\x7f\\x00\\x00\\x00\\x00\\x00\\x00\\xf8\\x7f\\x00\\x00\\x00\\x00\\x00\\x00\\xf8\\x7f\\x00\\x00\\x00\\x00\\x00\\x00\\xf8\\x7f\'\ntba(lp59\ng14\n(g15\n(dp60\ng17\n(lp61\ng19\n(g20\n(I0\ntS\'b\'\ntRp62\n(I1\n(L2L\ntg23\nI00\nS\'\\x00\\x00\'\ntbag19\n(g20\n(I0\ntS\'b\'\ntRp63\n(I1\n(L2L\ntg23\nI00\nS\'\\x00\\x01\'\ntbasg25\n(lp64\nNaNasg27\n(lp65\ng14\n(g29\n(dp66\ng31\ng19\n(g32\n(I0\ntS\'b\'\ntRp67\n(I1\n(L1L\ntg34\nI00\n(lp68\ng36\natbsg37\nNstRp69\nag14\n(g29\n(dp70\ng31\ng19\n(g32\n(I0\ntS\'b\'\ntRp71\n(I1\n(L2L\ntg34\nI00\n(lp72\ng42\nag43\natbsg37\nNstRp73\nasg45\nNstRp74\na(dp75\nS\'0.14.1\'\np76\n(dp77\nS\'axes\'\np78\ng13\nsS\'blocks\'\np79\n(lp80\n(dp81\nS\'mgr_locs\'\np82\nc__builtin__\nslice\np83\n(I0\nI2\nL1L\ntRp84\nsS\'values\'\np85\ng57\nsasstbsb.') df2 = pickle.loads('ccopy_reg\n_reconstructor\np1\n(cpandas.core.frame\nDataFrame\np2\nc__builtin__\nobject\np3\nNtRp4\n(dp5\nS\'_metadata\'\np6\n(lp7\nsS\'_typ\'\np8\nS\'dataframe\'\np9\nsS\'_data\'\np10\ng1\n(cpandas.core.internals\nBlockManager\np11\ng3\nNtRp12\n((lp13\ncpandas.core.index\n_new_Index\np14\n(cpandas.core.index\nMultiIndex\np15\n(dp16\nS\'labels\'\np17\n(lp18\ncnumpy.core.multiarray\n_reconstruct\np19\n(cpandas.core.base\nFrozenNDArray\np20\n(I0\ntS\'b\'\ntRp21\n(I1\n(L2L\ntcnumpy\ndtype\np22\n(S\'i1\'\nI0\nI1\ntRp23\n(I3\nS\'|\'\nNNNI-1\nI-1\nI0\ntbI00\nS\'\\x00\\x00\'\ntbag19\n(g20\n(I0\ntS\'b\'\ntRp24\n(I1\n(L2L\ntg23\nI00\nS\'\\x00\\x01\'\ntbasS\'names\'\np25\n(lp26\nNaNasS\'levels\'\np27\n(lp28\ng14\n(cpandas.core.index\nIndex\np29\n(dp30\nS\'data\'\np31\ng19\n(cnumpy\nndarray\np32\n(I0\ntS\'b\'\ntRp33\n(I1\n(L1L\ntg22\n(S\'O8\'\nI0\nI1\ntRp34\n(I3\nS\'|\'\nNNNI-1\nI-1\nI63\ntbI00\n(lp35\nVCUSTOMER_B\np36\natbsS\'name\'\np37\nNstRp38\nag14\n(g29\n(dp39\ng31\ng19\n(g32\n(I0\ntS\'b\'\ntRp40\n(I1\n(L2L\ntg34\nI00\n(lp41\nVVISIT_DT\np42\naVPURCHASE\np43\natbsg37\nNstRp44\nasS\'sortorder\'\np45\nNstRp46\nacpandas.tseries.index\n_new_DatetimeIndex\np47\n(cpandas.tseries.index\nDatetimeIndex\np48\n(dp49\nS\'tz\'\np50\nNsS\'freq\'\np51\nNsg31\ng19\n(g32\n(I0\ntS\'b\'\ntRp52\n(I1\n(L24L\ntg22\n(S\'M8\'\nI0\nI1\ntRp53\n(I4\nS\'<\'\nNNNI-1\nI-1\nI0\n((d(S\'ns\'\nI1\nI1\nI1\ntttbI00\nS\'\\x00\\x00k[\\x8e\\x1dP\\x10\\x00\\x00\\xba\\xec"lP\\x10\\x00\\x00\\t~\\xb7\\xbaP\\x10\\x00\\x00X\\x0fL\\tQ\\x10\\x00\\x00\\xa7\\xa0\\xe0WQ\\x10\\x00\\x00\\x94T\\x9eCR\\x10\\x00\\x00\\xe3\\xe52\\x92R\\x10\\x00\\x002w\\xc7\\xe0R\\x10\\x00\\x00\\x81\\x08\\\\/S\\x10\\x00\\x00\\xd0\\x99\\xf0}S\\x10\\x00\\x00\\xbdM\\xaeiT\\x10\\x00\\x00\\x0c\\xdfB\\xb8T\\x10\\x00\\x00[p\\xd7\\x06U\\x10\\x00\\x00\\xaa\\x01lUU\\x10\\x00\\x00\\xf9\\x92\\x00\\xa4U\\x10\\x00\\x00\\xe6F\\xbe\\x8fV\\x10\\x00\\x005\\xd8R\\xdeV\\x10\\x00\\x00\\x84i\\xe7,W\\x10\\x00\\x00\\xd3\\xfa{{W\\x10\\x00\\x00"\\x8c\\x10\\xcaW\\x10\\x00\\x00\\xc0\\xae9gX\\x10\\x00\\x00\\xc0\\xae9gX\\x10\\x00\\x00\\xc0\\xae9gX\\x10\\x00\\x00\\x0f@\\xce\\xb5X\\x10\'\ntbsg37\nS\'date\'\np54\nstRp55\na(lp56\ng19\n(g32\n(I0\ntS\'b\'\ntRp57\n(I1\n(L2L\nL24L\ntg22\n(S\'f8\'\nI0\nI1\ntRp58\n(I3\nS\'<\'\nNNNI-1\nI-1\nI0\ntbI00\nS\'\\x00\\x00\\x00\\x00\\x00\\x00\\xf8\\x7f\\x00\\x00\\x00\\x00\\x00\\x00\\xf8\\x7f\\x00\\x00\\x00\\x00\\x00\\x00\\xf8\\x7f\\x00\\x00\\x00\\x00\\x00\\x00\\xf8\\x7f\\x00\\x00\\x00\\x00\\x00\\x00\\xf8\\x7f\\x00\\x00\\x00\\x00\\x00\\x00\\xf8\\x7f\\x00\\x00\\x00\\x00\\x00\\x00\\xf8\\x7f\\x00\\x00\\x00\\x00\\x00\\x00\\xf8\\x7f\\x00\\x00\\x00\\x00\\x00\\x00\\xf8\\x7f\\x00\\x00\\x00\\x00\\x00\\x00\\xf8\\x7f\\x00\\x00\\x00\\x00\\x00\\x00\\xf8\\x7f\\x00\\x00\\x00\\x00\\x00\\x00\\xf8\\x7f\\x00\\x00\\x00\\x00\\x00\\x00\\xf8\\x7f\\x00\\x00\\x00\\x00\\x00\\x00\\xf8\\x7f\\x00\\x00\\x00\\x00\\x00\\x00\\xf8\\x7f\\x00\\x00\\x00\\x00\\x00\\x00\\xf8\\x7f\\x00\\x00\\x00\\x00\\x00\\x00\\xf8\\x7f\\x00\\x00\\x00\\x00\\x00\\x00\\xf8\\x7f\\x00\\x00\\x00\\x00\\x00\\x00\\xf8\\x7f\\x00\\x00\\x00\\x00\\x00\\x00\\xf8\\x7f\\x00\\x00\\x00\\xf0\\x0c$sA\\x00\\x00\\x00\\xf0\\x0c$sA\\x00\\x00\\x00\\xf0\\x0c$sA\\x00\\x00\\x00\\x00\\x00\\x00\\xf8\\x7f\\x00\\x00\\x00\\x00\\x00\\x00\\xf8\\x7f\\x00\\x00\\x00\\x00\\x00\\x00\\xf8\\x7f\\x00\\x00\\x00\\x00\\x00\\x00\\xf8\\x7f\\x00\\x00\\x00\\x00\\x00\\x00\\xf8\\x7f\\x00\\x00\\x00\\x00\\x00\\x00\\xf8\\x7f\\x00\\x00\\x00\\x00\\x00\\x00\\xf8\\x7f\\x00\\x00\\x00\\x00\\x00\\x00\\xf8\\x7f\\x00\\x00\\x00\\x00\\x00\\x00\\xf8\\x7f\\x00\\x00\\x00\\x00\\x00\\x00\\xf8\\x7f\\x00\\x00\\x00\\x00\\x00\\x00\\xf8\\x7f\\x00\\x00\\x00\\x00\\x00\\x00\\xf8\\x7f\\x00\\x00\\x00\\x00\\x00\\x00\\xf8\\x7f\\x00\\x00\\x00\\x00\\x00\\x00\\xf8\\x7f\\x00\\x00\\x00\\x00\\x00\\x00\\xf8\\x7f\\x00\\x00\\x00\\x00\\x00\\x00\\xf8\\x7f\\x00\\x00\\x00\\x00\\x00\\x00\\xf8\\x7f\\x00\\x00\\x00\\x00\\x00\\x00\\xf8\\x7f\\x00\\x00\\x00\\x00\\x00\\x00\\xf8\\x7f\\x00\\x00\\x00\\x00\\x00\\x00\\xf8\\x7f\\x00\\x00\\x00\\x00\\x00\\x00\\xf8\\x7f\\x00\\x00\\x00\\x00\\x00\\x00\\xf8\\x7f\\x00\\x00\\x00\\x00\\x00\\x00\\xf8\\x7f\\x00\\x00\\x00\\x00\\x00\\x00\\xf8\\x7f\\x00\\x00\\x00\\x00\\x00\\x00\\xf8\\x7f\'\ntba(lp59\ng14\n(g15\n(dp60\ng17\n(lp61\ng19\n(g20\n(I0\ntS\'b\'\ntRp62\n(I1\n(L2L\ntg23\nI00\nS\'\\x00\\x00\'\ntbag19\n(g20\n(I0\ntS\'b\'\ntRp63\n(I1\n(L2L\ntg23\nI00\nS\'\\x00\\x01\'\ntbasg25\n(lp64\nNaNasg27\n(lp65\ng14\n(g29\n(dp66\ng31\ng19\n(g32\n(I0\ntS\'b\'\ntRp67\n(I1\n(L1L\ntg34\nI00\n(lp68\ng36\natbsg37\nNstRp69\nag14\n(g29\n(dp70\ng31\ng19\n(g32\n(I0\ntS\'b\'\ntRp71\n(I1\n(L2L\ntg34\nI00\n(lp72\ng42\nag43\natbsg37\nNstRp73\nasg45\nNstRp74\na(dp75\nS\'0.14.1\'\np76\n(dp77\nS\'axes\'\np78\ng13\nsS\'blocks\'\np79\n(lp80\n(dp81\nS\'mgr_locs\'\np82\nc__builtin__\nslice\np83\n(I0\nI2\nL1L\ntRp84\nsS\'values\'\np85\ng57\nsasstbsb.') customers, tables = ['CUSTOMER_A', 'CUSTOMER_B'], [df1.iloc[:], df2.iloc[:]] tables = pd.concat(tables, keys=customers, axis=1) ``` ``` --------------------------------------------------------------------------- ValueError Traceback (most recent call last) in () 6 7 customers, tables = ['CUSTOMER_A', 'CUSTOMER_B'], [df1.iloc[:], df2.iloc[:]] ----> 8 tables = pd.concat(tables, keys=customers, axis=1) /home/code/anaconda2/lib/python2.7/site-packages/pandas/tools/merge.pyc in concat(objs, axis, join, join_axes, ignore_index, keys, levels, names, verify_integrity, copy) 833 verify_integrity=verify_integrity, 834 copy=copy) --> 835 return op.get_result() 836 837 /home/code/anaconda2/lib/python2.7/site-packages/pandas/tools/merge.pyc in get_result(self) 1023 new_data = concatenate_block_managers( 1024 mgrs_indexers, self.new_axes, -> 1025 concat_axis=self.axis, copy=self.copy) 1026 if not self.copy: 1027 new_data._consolidate_inplace() /home/code/anaconda2/lib/python2.7/site-packages/pandas/core/internals.pyc in concatenate_block_managers(mgrs_indexers, axes, concat_axis, copy) 4474 for placement, join_units in concat_plan] 4475 -> 4476 return BlockManager(blocks, axes) 4477 4478 /home/code/anaconda2/lib/python2.7/site-packages/pandas/core/internals.pyc in __init__(self, blocks, axes, do_integrity_check, fastpath) 2535 2536 if do_integrity_check: -> 2537 self._verify_integrity() 2538 2539 self._consolidate_check() /home/code/anaconda2/lib/python2.7/site-packages/pandas/core/internals.pyc in _verify_integrity(self) 2745 for block in self.blocks: 2746 if block._verify_integrity and block.shape[1:] != mgr_shape[1:]: -> 2747 construction_error(tot_items, block.shape[1:], self.axes) 2748 if len(self.items) != tot_items: 2749 raise AssertionError('Number of manager items must equal union of ' /home/code/anaconda2/lib/python2.7/site-packages/pandas/core/internals.pyc in construction_error(tot_items, block_shape, axes, e) 3897 raise ValueError("Empty data passed with indices specified.") 3898 raise ValueError("Shape of passed values is {0}, indices imply {1}".format( -> 3899 passed, implied)) 3900 3901 ValueError: Shape of passed values is (4, 31), indices imply (4, 25) ``` However if the dataframes are sliced e.g. [:10] OR [10:], the concat works: ``` customers, tables = ['CUSTOMER_A', 'CUSTOMER_B'], [df1.iloc[:20], df2.iloc[:20]] tables = pd.concat(tables, keys=customers, axis=1) tables ``` Output of `pd.show_versions()` ``` INSTALLED VERSIONS ------------------ commit: None python: 2.7.11.final.0 python-bits: 64 OS: Linux OS-release: 3.19.0-58-generic machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: en_AU.UTF-8 pandas: 0.18.0 nose: 1.3.7 pip: 8.1.1 setuptools: 20.6.7 Cython: 0.24 numpy: 1.10.4 scipy: 0.17.0 statsmodels: 0.6.1 xarray: None IPython: 4.1.2 sphinx: 1.4.1 patsy: 0.4.1 dateutil: 2.5.2 pytz: 2016.3 blosc: None bottleneck: 1.0.0 tables: 3.2.2 numexpr: 2.5.2 matplotlib: 1.5.1 openpyxl: 2.3.2 xlrd: 0.9.4 xlwt: 1.0.0 xlsxwriter: 0.8.4 lxml: 3.6.0 bs4: 4.4.1 html5lib: None httplib2: None apiclient: None sqlalchemy: 1.0.12 pymysql: None psycopg2: 2.6.1 (dt dec pq3 ext) jinja2: 2.8 boto: 2.39.0 ```
jreback commented 8 years ago

show df.info() for the input frames. a copy-pastable example would be helpful.

gregsifr commented 8 years ago

The example provided is copy-pastable (it includes the pickled dataframes)

Below is the output from df.info():

df1.info()

<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 22 entries, 2007-04-01 to 2007-04-30
Data columns (total 2 columns):
(CUSTOMER_A, VISIT_DT)    0 non-null float64
(CUSTOMER_A, PURCHASE)    0 non-null float64
dtypes: float64(2)
memory usage: 528.0 bytes

df2.info()

<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 24 entries, 2007-04-02 to 2007-04-30
Data columns (total 2 columns):
(CUSTOMER_B, VISIT_DT)    3 non-null float64
(CUSTOMER_B, PURCHASE)    0 non-null float64
dtypes: float64(2)
memory usage: 576.0 bytes
jreback commented 8 years ago

You can't concat non-unique indexes. It doesn't make sense as these are essentially gluing blocks.

In [37]: pd.merge(df1,df2,left_index=True,right_index=True,how='outer')
Out[37]: 
           CUSTOMER_A           CUSTOMER_B         
             VISIT_DT PURCHASE    VISIT_DT PURCHASE
date                                               
2007-04-01        NaN      NaN         NaN      NaN
2007-04-02        NaN      NaN         NaN      NaN
2007-04-03        NaN      NaN         NaN      NaN
2007-04-04        NaN      NaN         NaN      NaN
2007-04-05        NaN      NaN         NaN      NaN
2007-04-06        NaN      NaN         NaN      NaN
2007-04-09        NaN      NaN         NaN      NaN
2007-04-10        NaN      NaN         NaN      NaN
2007-04-11        NaN      NaN         NaN      NaN
2007-04-12        NaN      NaN         NaN      NaN
2007-04-13        NaN      NaN         NaN      NaN
2007-04-16        NaN      NaN         NaN      NaN
2007-04-17        NaN      NaN         NaN      NaN
2007-04-18        NaN      NaN         NaN      NaN
2007-04-19        NaN      NaN         NaN      NaN
2007-04-20        NaN      NaN         NaN      NaN
2007-04-23        NaN      NaN         NaN      NaN
2007-04-24        NaN      NaN         NaN      NaN
2007-04-25        NaN      NaN         NaN      NaN
2007-04-26        NaN      NaN         NaN      NaN
2007-04-27        NaN      NaN         NaN      NaN
2007-04-29        NaN      NaN  20070607.0      NaN
2007-04-29        NaN      NaN  20070607.0      NaN
2007-04-29        NaN      NaN  20070607.0      NaN
2007-04-30        NaN      NaN         NaN      NaN

I think we could give a better error message here. Want to take a crack at it?

gregsifr commented 8 years ago

I'd be happy to take a crack at it.

jreback commented 8 years ago

gr8!

TomAugspurger commented 4 years ago

Duplicate of https://github.com/pandas-dev/pandas/issues/6963.