pandas deprecation warning in steps_pytest_harvest_utils.py

j-carson commented 3 years ago

At line 86 of steps_pytest_harvest_utils.py, the columns have a single level index on the left and a two level index on the right. This is causing a pandas deprecation warning.

Test case: insert the following into tests/test_steps_harvest.py at line 64 and run the library test suite.

import warnings
warnings.warn("error")

You could perhaps fix the warning with the flatten_multilevel_columns function, but the column name change might affect existing tests.

smarie commented 3 years ago

Nice catch @j-carson !

smarie commented 3 years ago

I was not able to reproduce this with the above code unfortunately. Could you please let me know how to reproduce it ? (note: I tried to replace it with warnings.simplefilter("error") but it does not seem to trigger it as well)

My pandas version is 1.3.2 and python 3.8

Alternatively can you be more precise about the file/line where the warning happens ? Thanks !

j-carson commented 3 years ago

Are you sure you can't reproduce? I just created a new environment with miniconda and followed the instructions on the readme. Running "nox" I definitely see two warnings...


 =============================== warnings summary ===============================
 pytest_steps/tests/test_docs_example_with_harvest.py::test_synthesis_df
 pytest_steps/tests/test_steps_harvest.py::test_synthesis
   /Users/jlc/steps/python-pytest-steps/.nox/tests-3-9-env-pytest-latest/lib/python3.9/site-packages/pandas/core/frame.py:9126: FutureWarning: merging between different levels is deprecated and will be removed in a future version. (1 levels on the left,2 on the right)
     return merge(

smarie commented 3 years ago

I tried on two existing environments with latest version of pandas and could not see this :( I'll try again tomorrow

j-carson commented 3 years ago

I tried to paste all my nox output in here, but it was too big. edit to add: it’s in the nox output of the current open PR

smarie commented 3 years ago

No worries. Note that I hacked nox for my projects so that you get a nice log for each job under .nox/_runlogs so you can access the file corresponding to that specific session in there, if needed.

Also I finally managed to reproduce it :D as you were suggesting, reusing an existing env was not sufficient but creating a new one was ok. This probably relates to a package version difference somewhere.

I'll flatten as you suggest, hoping that this will not have any other side effect..

QuLogic commented 8 months ago

This is failing with current versions of Pandas:

_____________________ ERROR at setup of test_synthesis_df ______________________

request = <SubRequest 'module_results_df_steps_pivoted' for <Function test_synthesis_df>>
module_results_df =                                                                     pytest_obj  ...  accuracy
test_id                s...05
                       score    <function test_my_app_bench at 0x7f8eb490c5e0>  ...       NaN

[12 rows x 7 columns]

    @pytest.fixture(scope='function')
    def module_results_df_steps_pivoted(request, module_results_df):
        """
        A pivoted version of fixture `module_results_df` from pytest_harvest.
        In this version, there is one row per test with the results from all steps in columns.
        """
        # Handle the steps
        module_results_df = handle_steps_in_results_df(module_results_df, keep_orig_id=False)

        # Pivot
>       return pivot_steps_on_df(module_results_df, pytest_session=request.session)

pytest_steps/plugin.py:32: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
pytest_steps/steps_harvest_df_utils.py:86: in pivot_steps_on_df
    return remaining_df.join(one_per_step_df)
/usr/lib64/python3.12/site-packages/pandas/core/frame.py:10730: in join
    return merge(
/usr/lib64/python3.12/site-packages/pandas/core/reshape/merge.py:170: in merge
    op = _MergeOperation(
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

self = <pandas.core.reshape.merge._MergeOperation object at 0x7f8e9c342bd0>
left =                                                             pytest_obj  ...  dataset_param
test_id                    ...     C
test_my_app_bench[C-2]  <function test_my_app_bench at 0x7f8eb490c5e0>  ...              C

[6 rows x 3 columns]
right = step_id                  train              ...       score               
                        status duration_ms ....116760  my dataset #C
test_my_app_bench[C-2]  passed    0.132810  ...    0.108233  my dataset #C

[6 rows x 7 columns]
how = 'left', on = None, left_on = None, right_on = None, left_index = True
right_index = True, sort = False, suffixes = ('', ''), indicator = False
validate = None

    def __init__(
        self,
        left: DataFrame | Series,
        right: DataFrame | Series,
        how: JoinHow | Literal["asof"] = "inner",
        on: IndexLabel | AnyArrayLike | None = None,
        left_on: IndexLabel | AnyArrayLike | None = None,
        right_on: IndexLabel | AnyArrayLike | None = None,
        left_index: bool = False,
        right_index: bool = False,
        sort: bool = True,
        suffixes: Suffixes = ("_x", "_y"),
        indicator: str | bool = False,
        validate: str | None = None,
    ) -> None:
        _left = _validate_operand(left)
        _right = _validate_operand(right)
        self.left = self.orig_left = _left
        self.right = self.orig_right = _right
        self.how = how

        self.on = com.maybe_make_list(on)

        self.suffixes = suffixes
        self.sort = sort or how == "outer"

        self.left_index = left_index
        self.right_index = right_index

        self.indicator = indicator

        if not is_bool(left_index):
            raise ValueError(
                f"left_index parameter must be of type bool, not {type(left_index)}"
            )
        if not is_bool(right_index):
            raise ValueError(
                f"right_index parameter must be of type bool, not {type(right_index)}"
            )

        # GH 40993: raise when merging between different levels; enforced in 2.0
        if _left.columns.nlevels != _right.columns.nlevels:
            msg = (
                "Not allowed to merge between different levels. "
                f"({_left.columns.nlevels} levels on the left, "
                f"{_right.columns.nlevels} on the right)"
            )
>           raise MergeError(msg)
E           pandas.errors.MergeError: Not allowed to merge between different levels. (1 levels on the left, 2 on the right)

/usr/lib64/python3.12/site-packages/pandas/core/reshape/merge.py:784: MergeError
=================================== FAILURES ===================================
________________________________ test_synthesis ________________________________

request = <FixtureRequest for <Function test_synthesis>>
fixture_store = OrderedDict({'dataset': OrderedDict({'pytest_steps/tests/test_docs_example_with_harvest.py::test_my_app_bench[A-1-trai...cy': 0.46894857698850767}, 'pytest_steps/tests/test_steps_harvest.py::test_my_app_bench[C-2-score]': ResultsBag:
{}})})

    def test_synthesis(request, fixture_store):
        """
        Tests that users can create a pivoted syntesis table manually by combining pytest-harvest and pytest-steps.

        Note: we could do this at many other places (hook, teardown of a session-scope fixture...)
        """
        # Get session synthesis
        # - filtered on the test function of interest
        # - combined with default fixture store and results bag
        results_dct = get_session_synthesis_dct(request, filter=test_synthesis.__module__,
                                                durations_in_ms=True, test_id_format='function', status_details=False,
                                                fixture_store=fixture_store, flatten=True, flatten_more='results_bag')

        # We could use this function to perform the test id split here, but we will do it directly on the df
        # results_dct = handle_steps_in_results_dct(results_dct, is_flat=True, keep_orig_id=False)

        # convert to a pandas dataframe
        results_df = pd.DataFrame.from_dict(results_dct, orient='index')
        results_df = results_df.loc[list(results_dct.keys()), :]     # fix rows order
        results_df.index.name = 'test_id'
        # results_df.index.names = ['test_id', 'step_id']              # set multiindex names
        results_df.drop(['pytest_obj'], axis=1, inplace=True)        # drop pytest object column

        # extract the step id and replace the index by a multiindex
        results_df = handle_steps_in_results_df(results_df, keep_orig_id=False)

        # Pivot but do not raise an error if one of the above columns is not present - just in case.
>       pivoted_df = pivot_steps_on_df(results_df, pytest_session=request.session)

pytest_steps/tests/test_steps_harvest.py:86: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
pytest_steps/steps_harvest_df_utils.py:86: in pivot_steps_on_df
    return remaining_df.join(one_per_step_df)
/usr/lib64/python3.12/site-packages/pandas/core/frame.py:10730: in join
    return merge(
/usr/lib64/python3.12/site-packages/pandas/core/reshape/merge.py:170: in merge
    op = _MergeOperation(
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

self = <pandas.core.reshape.merge._MergeOperation object at 0x7f8e9ce94bc0>
left =                         algo_param dataset_param
test_id                                         
test_my_app_bench[A-...    1.0             C
test_my_app_bench[C-2]         2.0             C
test_basic                     NaN           NaN
right = step_id                  train              ...       -            
                        status duration_ms  ...  s...8083  ...     NaN         NaN
test_my_app_bench[C-2]  passed    0.110367  ...     NaN         NaN

[7 rows x 9 columns]
how = 'left', on = None, left_on = None, right_on = None, left_index = True
right_index = True, sort = False, suffixes = ('', ''), indicator = False
validate = None

    def __init__(
        self,
        left: DataFrame | Series,
        right: DataFrame | Series,
        how: JoinHow | Literal["asof"] = "inner",
        on: IndexLabel | AnyArrayLike | None = None,
        left_on: IndexLabel | AnyArrayLike | None = None,
        right_on: IndexLabel | AnyArrayLike | None = None,
        left_index: bool = False,
        right_index: bool = False,
        sort: bool = True,
        suffixes: Suffixes = ("_x", "_y"),
        indicator: str | bool = False,
        validate: str | None = None,
    ) -> None:
        _left = _validate_operand(left)
        _right = _validate_operand(right)
        self.left = self.orig_left = _left
        self.right = self.orig_right = _right
        self.how = how

        self.on = com.maybe_make_list(on)

        self.suffixes = suffixes
        self.sort = sort or how == "outer"

        self.left_index = left_index
        self.right_index = right_index

        self.indicator = indicator

        if not is_bool(left_index):
            raise ValueError(
                f"left_index parameter must be of type bool, not {type(left_index)}"
            )
        if not is_bool(right_index):
            raise ValueError(
                f"right_index parameter must be of type bool, not {type(right_index)}"
            )

        # GH 40993: raise when merging between different levels; enforced in 2.0
        if _left.columns.nlevels != _right.columns.nlevels:
            msg = (
                "Not allowed to merge between different levels. "
                f"({_left.columns.nlevels} levels on the left, "
                f"{_right.columns.nlevels} on the right)"
            )
>           raise MergeError(msg)
E           pandas.errors.MergeError: Not allowed to merge between different levels. (1 levels on the left, 2 on the right)

/usr/lib64/python3.12/site-packages/pandas/core/reshape/merge.py:784: MergeError

smarie / python-pytest-steps

pandas deprecation warning in steps_pytest_harvest_utils.py #45