kieferk / dfply

dplyr-style piping operations for pandas dataframes
GNU General Public License v3.0
889 stars 103 forks source link

Upcoming v1.0.0 #14

Closed kieferk closed 5 years ago

kieferk commented 7 years ago

Hello users,

I have been working on and off on an upcoming version which will be v1.0.0 due to its incompatibility with previous versions. You can actually view this nearly-complete version in the "feature/collapsed-selection" branch.

Originally I was just working on getting the selection helper functions working, but in order to do that a lot had to change with the base decorators. The selection helper functions now work (such as contains("ca") for finding columns that contain that string as an argument to the select function. Previously, there were a variety of different decorators that would be stacked together to get different kinds of behavior. In the new conceptualization, the only decorator that will be used is the @dfpipe decorator and it will take keyword arguments that can change its behavior (it can also be used without keyword arguments in which case it will behave as the current @dfpipe decorator does now.

If you're interested in checking it out and have any questions/comments/concerns, please go ahead. I don't have a timetable for its release but considering it's nearing completion and currently passes all the written unit tests, I don't expect it will be much longer.

jankatins commented 7 years ago

just some food for thought from adding a small verb in tidr and what I liked about the dplyr/tidr implementation:

kieferk commented 7 years ago

I like both of those. I was trying to do something similar to the@singledispatch thing you're talking about with the new TypeAction class, but this sounds better in many ways. I will check out your prototype. I'll also see if it would be easy to put the non-standard evaluation functions in and redesign the internals to mimic this kind of functionality. Thanks for the tips!

kieferk commented 7 years ago

@janschulz I read over these and I definitely like the singledispatch/registration pattern for these verbs. I think this is a more elegant solution to the one I was building towards.

Maybe I'm being overly optimistic but I think that I could change the base code that deals with piping to fit into this framework without a huge amount of hassle. I just need to make sure it all plays nice with the symbolic X. Right now i have the selection helpers working in the feature/collapsed-selection branch, which was tricky to get working correctly. Not sure how it will go down with this setup until I test things out.

Regardless, I think your suggested pattern is the way to go because it is readable and easy to extend. With the winter holidays coming up in a week or two I am going to use that free time to try and change the implementation to match what you've laid out in your blog post.

Make42 commented 6 years ago

Since you are changed quite a bit anyway for 1.0.0, maybe you can consider to use filter (as in dplyr) instead of mask? I am not sure why you deviated in this particular case, while kept to the verbs of dplyr otherwise.

Also: Could you include the option to filter based on the index?

FInally: Do you have an estimate for when the 1.0.0 version will come out?

Make42 commented 5 years ago

Is 1.0.0 abandoned? Or why did you close the topic?

sharpe5 commented 5 years ago

I'm also keen to know!