Closed dlovell closed 2 weeks ago
What's the use case that's enabled here that requires this change? What can you not do with the current codebase?
What's the use case that's enabled here that requires this change? What can you not do with the current codebase?
register a dataframe with non-range index and have cases work
alternatively, should dataframes' index be sanitized on registration? is there a specification of what should be true about dataframes that are registered?
register a dataframe with non-range index and have cases work
It would be good to have a less abstract example documented in this PR. Doesn't really even have to be code, just some description that helps justify why we should take on any additional code to the pandas backend.
register a dataframe with non-range index and have cases work
It would be good to have a less abstract example documented in this PR. Doesn't really even have to be code, just some description that helps justify why we should take on any additional code to the pandas backend.
Does this match what you're looking for?
When I run this code
def do_replace(col):
return (
col
.cases(
(
(1, "one"),
(2, "two"),
),
default="unk",
)
)
df = pd.DataFrame({
"A": pd.Series({i: i % 3 for i in (0, 1, 2, 4)}),
"B": 0,
})
expr = ibis.pandas.connect({"t": df}).table("t")
print("Input")
print(len(expr.execute()))
print(expr.execute())
print()
print("Current results")
x = expr.mutate(**{"A": lambda t: t["A"].pipe(do_replace)}).execute()
print(len(x))
print(x)
print()
I get these results
Input
4
A B
0 0 0
1 1 0
2 2 0
4 1 0
Current results
5
A B
0 unk 0.0
1 one 0.0
2 two 0.0
3 one NaN
4 NaN 0.0
Heh, yeah, that does :)
I'll just keep suggesting that the pandas backend should probably be avoided.
I somewhat reluctantly will accept this PR, fully realizing that creating the pandas backend was probably a doomed idea from the start 😂
Description of changes
This PR makes
PandasExecutor
create theSeries
with an index that matches the incoming data. Currently, when the incoming data does not use aRangeIndex
, the output index is a union of aRangeIndex
and the incoming data index.