Open Dmitrii-I opened 8 years ago
yeah looks strange. merge
should work though...
pd.merge(d1, d2, on='a', how='outer')
# a b c
# 0 1 11 aa
# 1 2 22 bb
There is also a problem with the overlapping columns detection:
In [11]: d1.join(d2, on=['a'])
...
ValueError: columns overlap but no suffix specified: Index([u'a'], dtype='object')
as this should of course not be raised if this is the column you want to merge on.
how="inner" also gives unexpected results:
d1.join(d1, on=['a'], how='inner', lsuffix='.d1', rsuffix='.d2')
# a a.d1 b.d1 a.d2 b.d2
# 0 1 1 11 2 22
d2.join(d1, on=['a'], how='inner', lsuffix='.d1', rsuffix='.d2')
# a a.d1 c a.d2 b
# 0 1 1 aa 2 22
you just need a simple concat or merge here.
In [4]: pd.concat([d1,d2],axis=1)
Out[4]:
a b a c
0 1 11 1 aa
1 2 22 2 bb
In [5]: pd.merge(d1,d2, on=['a'], how='outer')
...:
Out[5]:
a b c
0 1 11 aa
1 2 22 bb
Here is what on does.
on : column name, tuple/list of column names, or array-like
Column(s) in the caller to join on the index in other,
otherwise joins index-on-index. If multiples
columns given, the passed DataFrame must have a MultiIndex. Can
pass an array as the join key if not already contained in the
calling DataFrame. Like an Excel VLOOKUP operation
So this has to do with the overlapping detection. I think this should just raise as its fundamentally a user error here. This is not a join/merge rather a concat; here the difference matters.
Code Sample, a copy-pastable example if possible
Actual output
Expected Output
output of
pd.show_versions()
INSTALLED VERSIONS
commit: None python: 3.4.3.final.0 python-bits: 64 OS: Linux OS-release: 3.19.0-65-generic machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: en_US.UTF-8
pandas: 0.18.1 nose: None pip: 1.5.4 setuptools: 3.3 Cython: None numpy: 1.10.1 scipy: 0.17.1 statsmodels: 0.6.1 xarray: None IPython: 4.0.0 sphinx: None patsy: 0.4.1 dateutil: 2.5.2 pytz: 2016.3 blosc: None bottleneck: None tables: 3.2.2 numexpr: 2.4.6 matplotlib: None openpyxl: None xlrd: None xlwt: None xlsxwriter: None lxml: None bs4: None html5lib: 0.999 httplib2: 0.9.2 apiclient: None sqlalchemy: None pymysql: None psycopg2: 2.6.1 (dt dec pq3 ext lo64) jinja2: 2.8 boto: None pandas_datareader: None