Open jluttine opened 6 years ago
Thanks for the report. Sounds reasonable and I can replicate this on the latest release. Investigation and PR's welcome!
For an Index, I think it's relatively straightforward to solve this. However, for MultiIndex I don't know of a direct way to create an empty MultiIndex with a specified dtypes for each of the levels. The only way I can figure out how to do this is to create a non-emptyDataFrame with the specified types, and then subset it so that it becomes empty; e.g.
df = pd.DataFrame(
{
'a': pd.DatetimeIndex(["2018-01-01"]),
'b': pd.DatetimeIndex(["2018-01-01"]),
'c': 1
}
).set_index(['a', 'b'])
df = df[df.c == 0]
Is there a better way, even internally?
I'm seeing something similar with an empty DataFrame
in the latest pandas (v1.1.5):
>>> import pandas as pd
>>> pd.__version__
'1.1.5'
>>> df = pd.DataFrame([], columns=["A", "B"])
>>> df
Empty DataFrame
Columns: [A, B]
Index: []
>>> df.groupby("A", group_keys=False).apply(lambda g: g)
Empty DataFrame
Columns: []
Index: []
I would expect the groupby.apply
to preserve the columns of the empty DataFrame
. I haven't checked to see whether #34998 fixes this.
Code Sample, a copy-pastable example if possible
Correct behaviour for non-empty series - The index is kept unchanged:
Incorrect behaviour for empty series - The index is changed:
Problem description
The index should remain unchanged.
Why does this matter at all?
.loc["2018-01-01":]
Expected Output
Expected behaviour for empty series:
Output of
pd.show_versions()