Open multimeric opened 2 years ago
isn't it just as easy to use df1.merge(df2, on="a").set_index("a")
?
Otherwise we risk introducing features that need to be maintained and tested with further developments when these method already exist?
edit: Now i see the end of your post, ok, but im -1 on this.
You also have to reset the index to ensure it's a column, and I think the three points above show enough merit to make this worthwhile. A chain of 3 methods versus one method and one parameter is a big improvement.
take
@multimeric its fair to give a full response on this since you raise sensible points.
The pandas API is large (too large). My general approach is to not add any args / methods that perform functions that can already be performed. In fact I am in favour of selectively removing / reducing args when multiple ways of performing tasks exist. And my PRs reflect this philosophy.
Probably less efficient
In the long run this has the advantage of making code more maintainable for developers, and likely improves performance since those core methods can be optimised for general tasks as opposed to optimising selective and individual cases, or specific ways to handle args. This is important for the longevity, and future development of pandas.
More verbose
This is subjective. Personally I strive for an atomised code construction. In software development I prefer using core methods rather than subtle args to avoid the operational risk of arg deprecation.
merge
and set_index
are core methods so are unlikely to be restructured, so I would favour chaining these, especially where merge
is such a complex method in terms of combinatorial challenges.
Not intuitive or clear to users
Fully agree. I think use cases like this and adding to documention and cookbooks are valuable and we should work to provide better examples that users can copy, in the knowledge that pandas teams offers confidence that it is the "most efficient" way. This is a development item and something we need to do better.
Sorry I don't support your idea, hope you appreciate my feedback.
Is your feature request related to a problem?
I want to be able to merge two DataFrames, but keep the index of the left one in the final result:
The current merge behaviour is to just drop the index entirely:
Describe the solution you'd like
We add a new parameter
preserve_index
tomerge
, which takes either"left"
,"right"
, orNone
DataFrame.merge(preserve_index="left")
In my above example, this would work like:
API breaking implications
None. This is a new parameter, and if it is not provided the API is identical.
Describe alternatives you've considered
It is already possible to work around this by resetting the index and then setting it as an index again, as described here but this is: