Issue with 'arrange' when df has an index

kieferk / dfply

dplyr-style piping operations for pandas dataframes

GNU General Public License v3.0

889 stars 103 forks source link

Hi, Please take a look at the following example:

from dfply import * utime = pd.DataFrame({"u":1,"eventTime":["01-01-1971 01:04:00","01-01-1971 02:07:00","01-01-1971 01:09:00","01-01-1971 01:10:00"]}) print(utime >> arrange(X.eventTime))

utime = utime.set_index("u") print(utime >> d.arrange(X.eventTime))

In the first option, the result is as expected. When introducing an index, the result is incorrect and contains 4 times as many values as before.

I'm not sure if it's bug or an expected behavior, as I'm a newbie to pandas and to indices of data frames.

output for the code: eventTime u 0 01-01-1971 01:04:00 1 2 01-01-1971 01:09:00 1 3 01-01-1971 01:10:00 1 1 01-01-1971 02:07:00 1 eventTime u
1 01-01-1971 01:04:00 1 01-01-1971 02:07:00 1 01-01-1971 01:09:00 1 01-01-1971 01:10:00 1 01-01-1971 01:04:00 1 01-01-1971 02:07:00 1 01-01-1971 01:09:00 1 01-01-1971 01:10:00 1 01-01-1971 01:04:00 1 01-01-1971 02:07:00 1 01-01-1971 01:09:00 1 01-01-1971 01:10:00 1 01-01-1971 01:04:00 1 01-01-1971 02:07:00 1 01-01-1971 01:09:00 1 01-01-1971 01:10:00

from dfply import * utime = pd.DataFrame({"u":1,"eventTime":["01-01-1971 01:04:00","01-01-1971 02:07:00","01-01-1971 01:09:00","01-01-1971 01:10:00"]}) print(utime >> arrange(X.eventTime)) eventTime u 0 01-01-1971 01:04:00 1 2 01-01-1971 01:09:00 1 3 01-01-1971 01:10:00 1 1 01-01-1971 02:07:00 1 utime = utime.set_index("u") print(utime >> arrange(X.eventTime)) eventTime u 1 01-01-1971 01:04:00 1 01-01-1971 01:09:00 1 01-01-1971 01:10:00 1 01-01-1971 02:07:00

kieferk / dfply

Issue with 'arrange' when df has an index #47