<!--
You can obtain the Modin version with
python -c "import modin; print(modin.__version__)"
-->
### Describe the problem
<!-- Describe the problem clearly here. -->
Problem occurred because partitions that contains only float data (all partitions except the first) will read data as float values while pandas performs reading all data as strings.
### Source code / logs
<!-- Include any logs or source code that would be helpful to diagnose the problem. If including tracebacks, please include the full traceback. Large logs and files should be attached. Try to provide a reproducible test case that is the bare minimum necessary to generate the problem. -->
pandas.read_csv output:
0
0 col_name
1 3745.401188473625
2 9507.143064099162
3 7319.939418114051
4 5986.584841970366
5 1560.1864044243653
UserWarning: Ray execution environment not yet initialized. Initializing...
To remove this warning, run the following python code before doing dataframe operations:
import ray
ray.init()
pd.read_csv output:
0
0 col_name
1 3745.401188473625
2 9507.14
3 7319.94
4 5986.58
5 1560.19
Traceback (most recent call last):
File "test.py", line 272, in
df_equals(df_pandas, df_pd)
File "/modin/modin/pandas/test/utils.py", line 520, in df_equals
assert_frame_equal(
File "/miniconda3/envs/modin/lib/python3.8/site-packages/pandas/_testing.py", line 1611, in assert_frame_equal
assert_series_equal(
File "/miniconda3/envs/modin/lib/python3.8/site-packages/pandas/_testing.py", line 1394, in assert_series_equal
_testing.assert_almost_equal(
File "pandas/_libs/testing.pyx", line 67, in pandas._libs.testing.assert_almost_equal
File "pandas/_libs/testing.pyx", line 182, in pandas._libs.testing.assert_almost_equal
File "/miniconda3/envs/modin/lib/python3.8/site-packages/pandas/_testing.py", line 1036, in raise_assert_detail
raise AssertionError(msg)
AssertionError: DataFrame.iloc[:, 0] (column name="0") are different
System information
os.environ["MODIN_CPUS"] = "4" os.environ["MODIN_ENGINE"] = "ray"
import pandas import modin.pandas as pd from modin.pandas.test.utils import df_equals import numpy as np import csv
filename = "test_float.csv" float_precision = "round_trip" data_size = 5 random_state = np.random.RandomState(seed=42) data = ["col_name"] + random_state.uniform(low=0.0, high=10000.0, size=data_size).astype(str).tolist() data = "\n".join(data) kwargs = {"filepath_or_buffer": filename, "header": None}
try: with open(filename, "w") as f: f.write(data)
finally: os.remove(filename)
pandas.read_csv output: 0 0 col_name 1 3745.401188473625 2 9507.143064099162 3 7319.939418114051 4 5986.584841970366 5 1560.1864044243653 UserWarning: Ray execution environment not yet initialized. Initializing... To remove this warning, run the following python code before doing dataframe operations:
pd.read_csv output: 0 0 col_name 1 3745.401188473625 2 9507.14 3 7319.94 4 5986.58 5 1560.19 Traceback (most recent call last): File "test.py", line 272, in
df_equals(df_pandas, df_pd)
File "/modin/modin/pandas/test/utils.py", line 520, in df_equals
assert_frame_equal(
File "/miniconda3/envs/modin/lib/python3.8/site-packages/pandas/_testing.py", line 1611, in assert_frame_equal
assert_series_equal(
File "/miniconda3/envs/modin/lib/python3.8/site-packages/pandas/_testing.py", line 1394, in assert_series_equal
_testing.assert_almost_equal(
File "pandas/_libs/testing.pyx", line 67, in pandas._libs.testing.assert_almost_equal
File "pandas/_libs/testing.pyx", line 182, in pandas._libs.testing.assert_almost_equal
File "/miniconda3/envs/modin/lib/python3.8/site-packages/pandas/_testing.py", line 1036, in raise_assert_detail
raise AssertionError(msg)
AssertionError: DataFrame.iloc[:, 0] (column name="0") are different
DataFrame.iloc[:, 0] (column name="0") values are different (66.66667 %) [index]: [0, 1, 2, 3, 4, 5] [left]: [col_name, 3745.401188473625, 9507.143064099162, 7319.939418114051, 5986.584841970366, 1560.1864044243653] [right]: [col_name, 3745.401188473625, 9507.143064099162, 7319.9394181140515, 5986.584841970366, 1560.1864044243653]