tompollard / tableone

Create "Table 1" for research papers in Python
https://pypi.python.org/pypi/tableone/
MIT License
159 stars 38 forks source link

Fix deprecated Pandas syntax (up to v2.2.2) #164

Closed tompollard closed 1 month ago

tompollard commented 1 month ago

Minor changes to update syntax to address warnings raised when running pytest with Pandas v2.2.2:

⏚ [tompollard:~/projects/tableone] [env] main* 5s ± pytest
======================================= test session starts ========================================
platform darwin -- Python 3.9.19, pytest-7.1.2, pluggy-1.0.0
rootdir: /Users/tompollard/projects/tableone
collected 30 items                                                                                 

tests/unit/test_tableone.py ..............................                                   [100%]

========================================= warnings summary =========================================
tests/unit/test_tableone.py: 36 warnings
  /Users/tompollard/projects/tableone/tableone/tableone.py:929: FutureWarning: The provided callable <function mean at 0x103d48790> is currently using DataFrameGroupBy.mean. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "mean" instead.
    df_cont = pd.pivot_table(cont_data,

tests/unit/test_tableone.py: 36 warnings
  /Users/tompollard/projects/tableone/tableone/tableone.py:929: FutureWarning: The provided callable <function median at 0x104646550> is currently using DataFrameGroupBy.median. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "median" instead.
    df_cont = pd.pivot_table(cont_data,

tests/unit/test_tableone.py: 36 warnings
  /Users/tompollard/projects/tableone/tableone/tableone.py:929: FutureWarning: The provided callable <built-in function min> is currently using DataFrameGroupBy.min. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "min" instead.
    df_cont = pd.pivot_table(cont_data,

tests/unit/test_tableone.py: 36 warnings
  /Users/tompollard/projects/tableone/tableone/tableone.py:929: FutureWarning: The provided callable <built-in function max> is currently using DataFrameGroupBy.max. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "max" instead.
    df_cont = pd.pivot_table(cont_data,

tests/unit/test_tableone.py::TestTableOne::test_tableone_row_sort_pn
  /Users/tompollard/projects/tableone/tests/unit/test_tableone.py:486: FutureWarning: unique with argument that is not not a Series, Index, ExtensionArray, or np.ndarray is deprecated and will raise in a future version.
    tableone_rows = pd.unique([x[0] for x in table.tableone.index.values])

tests/unit/test_tableone.py::TestTableOne::test_tableone_row_sort_pn
  /Users/tompollard/projects/tableone/tests/unit/test_tableone.py:494: FutureWarning: unique with argument that is not not a Series, Index, ExtensionArray, or np.ndarray is deprecated and will raise in a future version.
    tableone_rows = pd.unique([x[0] for x in table.tableone.index.values])

tests/unit/test_tableone.py::TestTableOne::test_string_data_as_continuous_error
  /Users/tompollard/projects/tableone/tests/unit/test_tableone.py:122: FutureWarning: Setting an item of incompatible dtype is deprecated and will raise an error in a future version of pandas. Value 'could not measure' has dtype incompatible with float64, please explicitly cast to a compatible dtype first.
    data_mixed.loc[1, 'mixed numeric data'] = 'could not measure'

tests/unit/test_tableone.py::TestTableOne::test_compute_standardized_mean_difference_continuous
tests/unit/test_tableone.py::TestTableOne::test_compute_standardized_mean_difference_categorical
  /Users/tompollard/projects/tableone/tableone/tableone.py:1176: FutureWarning: ChainedAssignmentError: behaviour will change in pandas 3.0!
  You are setting values through chained assignment. Currently this works in certain cases, but when using Copy-on-Write (which will become the default behaviour in pandas 3.0) this will never work to update the original DataFrame or Series, because the intermediate object on which we are setting values will behave as a copy.
  A typical example is when you are setting values in a column of a DataFrame, like:

  df["col"][row_indexer] = value

  Use `df.loc[row_indexer, "col"] = values` instead, to perform the assignment in a single step and ensure this keeps updating the original `df`.

  See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy

    df[colname.format(p[0], p[1])].loc[v] = smd

tests/unit/test_tableone.py::TestTableOne::test_compute_standardized_mean_difference_continuous
tests/unit/test_tableone.py::TestTableOne::test_compute_standardized_mean_difference_continuous
tests/unit/test_tableone.py::TestTableOne::test_compute_standardized_mean_difference_continuous
tests/unit/test_tableone.py::TestTableOne::test_compute_standardized_mean_difference_categorical
tests/unit/test_tableone.py::TestTableOne::test_compute_standardized_mean_difference_categorical
tests/unit/test_tableone.py::TestTableOne::test_compute_standardized_mean_difference_categorical
  /Users/tompollard/projects/tableone/tableone/tableone.py:1190: FutureWarning: ChainedAssignmentError: behaviour will change in pandas 3.0!
  You are setting values through chained assignment. Currently this works in certain cases, but when using Copy-on-Write (which will become the default behaviour in pandas 3.0) this will never work to update the original DataFrame or Series, because the intermediate object on which we are setting values will behave as a copy.
  A typical example is when you are setting values in a column of a DataFrame, like:

  df["col"][row_indexer] = value

  Use `df.loc[row_indexer, "col"] = values` instead, to perform the assignment in a single step and ensure this keeps updating the original `df`.

  See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy

    df[colname.format(p[0], p[1])].loc[v] = smd  # type: ignore

tests/unit/test_tableone.py::TestTableOne::test_compute_standardized_mean_difference_continuous
  /Users/tompollard/projects/tableone/tests/unit/test_tableone.py:970: FutureWarning: Series.__getitem__ treating keys as positions is deprecated. In a future version, integer keys will always be treated as labels (consistent with DataFrame behavior). To access a value by position, use `ser.iloc[pos]`
    smd = t.tableone.loc[k, 'Grouped by MechVent']['SMD (0,1)'][0]

tests/unit/test_tableone.py::TestTableOne::test_compute_standardized_mean_difference_categorical
  /Users/tompollard/projects/tableone/tests/unit/test_tableone.py:998: FutureWarning: Series.__getitem__ treating keys as positions is deprecated. In a future version, integer keys will always be treated as labels (consistent with DataFrame behavior). To access a value by position, use `ser.iloc[pos]`
    smd = t.tableone.loc[k, 'Grouped by MechVent']['SMD (0,1)'][0]

-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html