Closed azhuvath closed 5 months ago
@anmyachev, could you take a look at this?
Hi @azhuvath! Could you provide modin, pandas and pyarrow versions?
Current situation with unsupported read_csv
parameters with pyarrow: https://github.com/pandas-dev/pandas/issues/38872. Upstream pandas does not support nrows
parameter.
I found an issue in Modin.
The pandas.read_csv method supports ‘c’, ‘python’, ‘pyarrow’ engines. It looks like modin is not supported when using 'pyarrow' engine.
Intel(R) Extension for Scikit-learn enabled (https://github.com/intel/scikit-learn-intelex) 2024-06-18 05:01:34,908 INFO worker.py:1753 -- Started a local Ray instance. Traceback (most recent call last): File "/home/ad/anomaly_detection.py", line 41, in
model_fitting()
File "/home/ad/anomaly_detection.py", line 37, in model_fitting
raise e
File "/home/ad/anomaly_detection.py", line 14, in model_fitting
data_csv = pd.read_csv('./data.csv', engine='pyarrow')
File "/home/ad/analytics_env/lib/python3.10/site-packages/modin/utils.py", line 511, in wrapped
return func( params.args, params.kwargs)
File "/home/ad/analytics_env/lib/python3.10/site-packages/modin/logging/logger_decorator.py", line 125, in run_and_log
return obj(*args, kwargs)
File "/home/ad/analytics_env/lib/python3.10/site-packages/modin/pandas/io.py", line 227, in read_csv
return _read(kwargs)
File "/home/ad/analytics_env/lib/python3.10/site-packages/modin/pandas/io.py", line 117, in _read
pd_obj = FactoryDispatcher.read_csv(kwargs)
File "/home/ad/analytics_env/lib/python3.10/site-packages/modin/core/execution/dispatching/factories/dispatcher.py", line 207, in read_csv
return cls.get_factory()._read_csv(kwargs)
File "/home/ad/analytics_env/lib/python3.10/site-packages/modin/core/execution/dispatching/factories/factories.py", line 268, in _read_csv
return cls.io_cls.read_csv(*kwargs)
File "/home/ad/analytics_env/lib/python3.10/site-packages/modin/logging/logger_decorator.py", line 125, in run_and_log
return obj(args, kwargs)
File "/home/ad/analytics_env/lib/python3.10/site-packages/modin/core/io/file_dispatcher.py", line 159, in read
query_compiler = cls._read(*args, kwargs)
File "/home/ad/analytics_env/lib/python3.10/site-packages/modin/logging/logger_decorator.py", line 125, in run_and_log
return obj(*args, *kwargs)
File "/home/ad/analytics_env/lib/python3.10/site-packages/modin/core/io/text/text_file_dispatcher.py", line 1068, in _read
pd_df_metadata = cls.read_callback(
File "/home/ad/analytics_env/lib/python3.10/site-packages/modin/logging/logger_decorator.py", line 125, in run_and_log
return obj(args, kwargs)
File "/home/ad/analytics_env/lib/python3.10/site-packages/modin/core/storage_formats/pandas/parsers.py", line 381, in read_callback
return pandas.read_csv(*args, kwargs)
File "/home/ad/analytics_env/lib/python3.10/site-packages/pandas/io/parsers/readers.py", line 1026, in read_csv
return _read(filepath_or_buffer, kwds)
File "/home/ad/analytics_env/lib/python3.10/site-packages/pandas/io/parsers/readers.py", line 620, in _read
parser = TextFileReader(filepath_or_buffer, kwds)
File "/home/ad/analytics_env/lib/python3.10/site-packages/pandas/io/parsers/readers.py", line 1607, in init
options = self._get_options_with_defaults(engine)
File "/home/ad/analytics_env/lib/python3.10/site-packages/pandas/io/parsers/readers.py", line 1643, in _get_options_with_defaults
raise ValueError(
ValueError: The 'nrows' option is not supported with the 'pyarrow' engine