ouseful-PR / nbval

A py.test plugin to validate Jupyter notebooks
Other
0 stars 0 forks source link

Test for dataframe with unordered rows #14

Open psychemedia opened 1 month ago

psychemedia commented 1 month ago

Some data frames are returned in an arbitrary row order. Add a test that forces a row order before testing.

Claude.ai suggests the following generic comparison function:

def compare_sorted_dfs(df1, df2):
    # Check if DataFrames have the same columns
    if set(df1.columns) != set(df2.columns):
        raise ValueError("DataFrames must have the same columns")

    # Sort both DataFrames based on all columns
    df1_sorted = df1.sort_values(by=list(df1.columns))
    df2_sorted = df2.sort_values(by=list(df2.columns))

    # Reset index to ensure proper comparison
    df1_sorted = df1_sorted.reset_index(drop=True)
    df2_sorted = df2_sorted.reset_index(drop=True)

    return df1_sorted, df2_sorted