kieferk / dfply

dplyr-style piping operations for pandas dataframes
GNU General Public License v3.0
889 stars 103 forks source link

Compatibility break on joins #66

Closed CedricFR closed 5 years ago

CedricFR commented 5 years ago

Hi,

The latest modifications to the join functions are breaking some usage code.

The incriminated changes are: https://github.com/kieferk/dfply/compare/bbe03e8aa84005af01491a891cb36fb2c5979659...17b3440dcd53cef98e35e63a0dfce074e27b2748

The problem is this case:

df >> left_join(df2, by=["column1", "column2", "column3"])

It used to mean "merge based on those 3 columns", but now it means "merge based on column1 for left dataframe, and column2 on the right dataframe", which is quite different!

What's the rationale? Is it because we're now supposed to use tuples and not lists in those cases?

Either way, if we decide this is the way forward, it should be a major version change because it breaks one of the major use cases of dfply. What do you think?

kieferk commented 5 years ago

That's my fault. I changed it to fix another issue and didn't think about what the change implied in enough detail before pushing the change.

I'm at work now but I'll revert this tonight and come up with a way to fix the other issue a different way.

Sorry about that!

kieferk commented 5 years ago

OK hopefully the current push to master will resolve the issue. Let me know if not, obviously. Sorry for the inconvenience.