Open arunjose696 opened 1 month ago
Great start on solving this problem! Is it possible to avoid so many of the test changes?
The most changes in tests are disabling few checks as it wont be supported without partitions, and as the current changes dont yet support IO like pd.read_csv(), Is there something specific that should be avoided?
is there something specific that should be avoided?
Nothing specific, I was just trying to understand context. Thanks!
@arunjose696 please rebase on main
With the introduction of the small query compiler, we need to test the interoperability between DataFrames using different query compilers. For example, performing a binary operation between a DataFrame with the small query compiler and another with the Pandas query compiler. (Note: This feature is not yet included in this PR.)
This will require modifying or adding new tests. In the current tests in the modin/modin/tests/pandas/dataframe
folder, we have the following scenarios where two DataFrames interact:
1)Derived DataFrames: In tests where the second DataFrame is created or derived from the first, egtest_join_empty, we need to refactor these tests so that the second DataFrame is created separately from the first and with MODIN_NATIVE_DATAFRAME_MODE set.
2)Lambda Functions: In tests where the other DataFrame is created within a lambda function, eg test_divmod, we need to refactor these tests to either create the second DataFrame in the test definition itself or provide an additional wrapper for the lambda functions to ensure the DataFrame is created with a different query compilers.
3)Separate DataFrames: In tests where two separate DataFrames are used, eg test_where, we need to refactor these tests to include flipping the MODIN_NATIVE_DATAFRAME_MODE to None and Native_pandas when creating both the first and second DataFrame. This ensures that both the left and right operands are tested with different query compilers for interoperability. This flipping would also be required in cases mentioned in 1 and 2 after dataframes are separated.
Upon reviewing the modin/modin/tests/pandas/dataframe
folder, I found approximately 100 tests involving scenarios where two DataFrames interact. These tests may need refactoring or copying to a different directory and updating to specifically test interoperability.
@YarShev @anmyachev @devin-petersohn, could you please provide suggestions on how to approach testing the interoperability?
What do these changes do?
flake8 modin/ asv_bench/benchmarks scripts/doc_checker.py
black --check modin/ asv_bench/benchmarks scripts/doc_checker.py
git commit -s
docs/development/architecture.rst
is up-to-date