unionai-oss / pandera

A light-weight, flexible, and expressive statistical data testing library
https://www.union.ai/pandera
MIT License
3.34k stars 308 forks source link

MultiIndex dropped by reset_index with default argument #863

Open mheguy opened 2 years ago

mheguy commented 2 years ago

Describe the bug A clear and concise description of what the bug is.

Note: Please read this guide detailing how to provide the necessary information for us to reproduce your bug.

Code Sample, a copy-pastable example

import pandera as pa
multi_index = pa.DataFrameSchema(
    columns={"test_col": pa.Column(int)},
    index=pa.MultiIndex([pa.Index(int, name="index_1"), pa.Index(int, name="index_2")]),
)
single_index = pa.DataFrameSchema(
    columns={"test_col": pa.Column(int)}, index=pa.Index(int, name="index_1")
)
print(multi_index)
print("-----")
print(single_index)
print("-----")
print("-----")
print(multi_index.reset_index())
print("-----")
print(single_index.reset_index())
# By contrast, this will work as expected:
print("-----")
print(multi_index.reset_index(["index_1", "index_2"]))

Expected behavior

The indices to become columns.

Actual behavior

The MultiIndex is completely dropped without being added to the columns.

Desktop (please complete the following information):

Screenshots

image

image

Additional context

Found in pandera-0.9.0 Exists as recently as pandera-0.11.0

cosmicBboy commented 2 years ago

thanks for finding this bug @plague006 !

I think the issue is here: https://github.com/pandera-dev/pandera/blob/master/pandera/schemas.py#L1585-L1587

The if levels is None then level_temp needs to be all of the names in the MultiIndex instead of an empty list.

Would you be open to making a PR to fix this issue? This test would also need to be updated to check for the case where level = None to make sure that they become columns as expected.

mheguy commented 2 years ago

I would be happy to. Thanks for pointing me in the right direction. :)