kieferk / dfply

dplyr-style piping operations for pandas dataframes
GNU General Public License v3.0
889 stars 103 forks source link

How to select multiple values from same column using musk #54

Open prudhviraju535 opened 6 years ago

prudhviraju535 commented 6 years ago

Guys, How to filter multiple values from same column, Below code throws the error.

import pandas as pd from dfply import * data = pd.DataFrame({"Col1" :["a","b","c","d"],"Col2":[1,2,3,4]}) data >> mask(X.Col1 == ["a","b"])

Error: ValueError: Arrays were different lengths: 4 vs 2

cunningjames commented 6 years ago

You can't compare a column / series to a list that way. This will fail in base Pandas:

In [12]
data.Col1 == ["a","b"]

Truncated Traceback (Use C-c C-x to view full TB):
pandas\_libs\ops.pyx in pandas._libs.ops.vec_compare()

ValueError: Arrays were different lengths: 4 vs 2

You'd need to do something like this:

In [13]
data >> mask(X.Col1.isin(["a","b"]))
Out [13]:
  Col1  Col2
0    a     1
1    b     2
sharpe5 commented 6 years ago

Thanks for posting this answer.

I also ran into the same problem, and solved it in a rather roundabout way by first generating true/false arrays for each term, then using logical or on said arrays.

Your answer is much more readable and probably just as performant (if that matters).