Open andywong36 opened 2 years ago
@jwdink Since we're meeting later, I thought I'd throw this out there.
My thoughts for an initial approach here would be to simply have a wrapper for np.select as follows.
def case_when(condlist, choicelist, default = np.nan()):
# length/other checks
return(np.select(condlist, choicelist, default))
But it would be nice to make it more dplyr-esque with something as follows. If the syntax is kept simple, arbitrary symbol + commas for separation of parts within and between, this maybe could be done.
x == 2 -> 'A',
y == 1 -> 'B'
I'll admit I forgot about the existence of np.select
when this issue was created.
I was thinking the argument structure would be more like dplyr's, though:
example = pd.Series(range(20))
case_when(
(example == 0, '0'),
(example == 1, '1'),
(example <= 7, '<=7'),
(True, '7')
)
An implementation using np.select
would look like:
def case_when(*args, default=np.nan) -> np.ndarray:
"""
:param args: Tuples, the first element being the condition, the second being the value if that condition is
satisfied.
:param default: The default value when no conditions are met.
:return: An ndarray
"""
condlist, choicelist = zip(*args)
return np.select(condlist, choicelist, default)
https://www.rdocumentation.org/packages/dplyr/versions/1.0.10/topics/case_when
As an alternative to