We have merge() and merge_asof(). There may even come a time when we perform functions on overlapping columns. As someone who wants to join two tables together, I just want a single mechanism to do so.
I wonder if it's possible to have a single API like:
merge(
left, # DataFrame or Table
right, # DataFrame or Table
on, # one or more columns
asof, # one or more columns
how, # 'left', 'right', 'inner', 'outer'
overlap, # optional function to apply to overlapping column names
)
Users must specify at least one of on or asof. There can also be left_on/right_on and left_asof/right_asof. We could even have left_index/right_index for the poor souls who still have indexed data (https://github.com/pydata/pandas-design/issues/17).
The overlap is for when the same column name appears in both tables. Currently those columns are renamed with a suffix (though I'd be in favor of just raising an error). But there are a times when I want to perform a function. There are ways to do this with arithmetic operations (https://github.com/pydata/pandas-design/issues/30), though I think any function with two arguments would be nice, including overwritting the left with the right (for handling cases of missing data with a "fill" result).
Note that doesn't handle my proposed merge_window() (https://github.com/pydata/pandas/issues/13959). The semantics there are very specific and I'm not sure how to put that in a unified structure as with above, though I'd love to hear any ideas.
We have
merge()
andmerge_asof()
. There may even come a time when we perform functions on overlapping columns. As someone who wants to join two tables together, I just want a single mechanism to do so.I wonder if it's possible to have a single API like:
Users must specify at least one of
on
orasof
. There can also beleft_on
/right_on
andleft_asof
/right_asof
. We could even haveleft_index
/right_index
for the poor souls who still have indexed data (https://github.com/pydata/pandas-design/issues/17).The
overlap
is for when the same column name appears in both tables. Currently those columns are renamed with a suffix (though I'd be in favor of just raising an error). But there are a times when I want to perform a function. There are ways to do this with arithmetic operations (https://github.com/pydata/pandas-design/issues/30), though I think any function with two arguments would be nice, including overwritting the left with the right (for handling cases of missing data with a "fill" result).Note that doesn't handle my proposed
merge_window()
(https://github.com/pydata/pandas/issues/13959). The semantics there are very specific and I'm not sure how to put that in a unified structure as with above, though I'd love to hear any ideas.