kieferk / dfply

dplyr-style piping operations for pandas dataframes
GNU General Public License v3.0
889 stars 103 forks source link

Add row_number() function and test cases #26

Closed bleearmstrong closed 6 years ago

bleearmstrong commented 7 years ago

Provides row number based on specified column (equivalent to rank(method='first')). Returns a float (should this be changed to an int? If so, then maybe some of the other ranks should be changed to int as well. I believe pandas just returns a float because it's part of the rank family, which can return averages, so non-int values).

Usage

diamonds >> head() >> mutate(rn=row_number(X.x))

   carat      cut color clarity  depth  table  price     x     y     z   rn
0   0.23    Ideal     E     SI2   61.5   55.0    326  3.95  3.98  2.43  2.0
1   0.21  Premium     E     SI1   59.8   61.0    326  3.89  3.84  2.31  1.0
2   0.23     Good     E     VS1   56.9   65.0    327  4.05  4.07  2.31  3.0
3   0.29  Premium     I     VS2   62.4   58.0    334  4.20  4.23  2.63  4.0
4   0.31     Good     J     SI2   63.3   58.0    335  4.34  4.35  2.75  5.0
sharpe5 commented 6 years ago

Could this pull request be merged?

kieferk commented 6 years ago

Merging this into branch v0.3.1, which basically the pull requests that were sitting here by @bleearmstrong. After that I'll confirm the tests and merge 0.3.1 into master, so you should have it ready shortly, @sharpe5.