has2k1 / plydata

A grammar for data manipulation in Python
https://plydata.readthedocs.io/en/stable/
BSD 3-Clause "New" or "Revised" License
276 stars 11 forks source link

separate gives unintended output when index is not standard range #31

Open jakub-dyno opened 2 years ago

jakub-dyno commented 2 years ago

version 0.4.3

This works as intended

separate(pd.DataFrame({'a':[1,2,3], 'name':['a_b_c', 'd_e_f', 'g_h_i']}),
         'name', into=['x','y','z'])

   a  x  y  z
0  1  a  b  c
1  2  d  e  f
2  3  g  h  i

This mangles the output, even column a is affected.

separate(pd.DataFrame({'a':[1,2,3], 'name':['a_b_c', 'd_e_f', 'g_h_i']}, index=[1,2,3]),
         'name', into=['x','y','z'])

     a    x    y    z
0  NaN    a    b    c
1  1.0    d    e    f
2  2.0    g    h    i
3  3.0  NaN  NaN  NaN

This is happens quite often, like if you run query first it will have an index that doesn't start from zero.

has2k1 commented 2 years ago

In the master branch query returns a regular index. Still separate should behave.