Closed Bhavya6187 closed 2 years ago
Thanks @Bhavya6187! Ray 2.0 will deprecate the ray.ObjectID
name in favor of ray.ObjectRef
. This name change is currently supported in the latest stable release of Ray.
@devin-petersohn: I installed the latest version of modin (0.8.3+41.g5cb3283) from the master. I am getting a similar error:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/Users/shossain/anaconda3/envs/py37/lib/python3.7/site-packages/modin/pandas/io.py", line 134, in read_csv
return _read(**kwargs)
File "/Users/shossain/anaconda3/envs/py37/lib/python3.7/site-packages/modin/pandas/io.py", line 60, in _read
pd_obj = EngineDispatcher.read_csv(**kwargs)
File "/Users/shossain/anaconda3/envs/py37/lib/python3.7/site-packages/modin/data_management/factories/dispatcher.py", line 104, in read_csv
return cls.__engine._read_csv(**kwargs)
File "/Users/shossain/anaconda3/envs/py37/lib/python3.7/site-packages/modin/data_management/factories/factories.py", line 87, in _read_csv
return cls.io_cls.read_csv(**kwargs)
File "/Users/shossain/anaconda3/envs/py37/lib/python3.7/site-packages/modin/engines/base/io/file_dispatcher.py", line 29, in read
query_compiler = cls._read(*args, **kwargs)
File "/Users/shossain/anaconda3/envs/py37/lib/python3.7/site-packages/modin/engines/base/io/text/csv_dispatcher.py", line 30, in _read
return cls.single_worker_read(filepath_or_buffer, **kwargs)
File "/Users/shossain/anaconda3/envs/py37/lib/python3.7/site-packages/modin/backends/pandas/parsers.py", line 87, in single_worker_read
return cls.query_compiler_cls.from_pandas(pandas_frame, cls.frame_cls)
File "/Users/shossain/anaconda3/envs/py37/lib/python3.7/site-packages/modin/backends/pandas/query_compiler.py", line 209, in from_pandas
return cls(data_cls.from_pandas(df))
File "/Users/shossain/anaconda3/envs/py37/lib/python3.7/site-packages/modin/engines/base/frame/data.py", line 2032, in from_pandas
new_frame, new_lengths, new_widths = cls._frame_mgr_cls.from_pandas(df, True)
File "/Users/shossain/anaconda3/envs/py37/lib/python3.7/site-packages/modin/engines/base/frame/partition_manager.py", line 580, in from_pandas
for i in range(0, len(df), row_chunksize)
File "/Users/shossain/anaconda3/envs/py37/lib/python3.7/site-packages/modin/engines/base/frame/partition_manager.py", line 580, in <listcomp>
for i in range(0, len(df), row_chunksize)
File "/Users/shossain/anaconda3/envs/py37/lib/python3.7/site-packages/modin/engines/base/frame/partition_manager.py", line 578, in <listcomp>
for j in range(0, len(df.columns), col_chunksize)
File "/Users/shossain/anaconda3/envs/py37/lib/python3.7/site-packages/modin/engines/ray/pandas_on_ray/frame/partition.py", line 148, in put
return PandasOnRayFramePartition(ray.put(obj), len(obj.index), len(obj.columns))
File "/Users/shossain/anaconda3/envs/py37/lib/python3.7/site-packages/modin/engines/ray/pandas_on_ray/frame/partition.py", line 27, in __init__
assert type(object_id) is ray.ObjectRef
Here is the snippet:
import ray
import ray.util
ray.util.connect('<Host>:<Port>')
import modin
import modin.pandas as pd
columns_names = [
"trip_id", "vendor_id", "pickup_datetime", "dropoff_datetime", "store_and_fwd_flag",
"rate_code_id", "pickup_longitude", "pickup_latitude", "dropoff_longitude", "dropoff_latitude",
"passenger_count", "trip_distance", "fare_amount", "extra", "mta_tax", "tip_amount",
"tolls_amount", "ehail_fee", "improvement_surcharge", "total_amount", "payment_type",
"trip_type", "pickup", "dropoff", "cab_type", "precipitation", "snow_depth", "snowfall",
"max_temperature", "min_temperature", "average_wind_speed", "pickup_nyct2010_gid",
"pickup_ctlabel", "pickup_borocode", "pickup_boroname", "pickup_ct2010",
"pickup_boroct2010", "pickup_cdeligibil", "pickup_ntacode", "pickup_ntaname", "pickup_puma",
"dropoff_nyct2010_gid", "dropoff_ctlabel", "dropoff_borocode", "dropoff_boroname",
"dropoff_ct2010", "dropoff_boroct2010", "dropoff_cdeligibil", "dropoff_ntacode",
"dropoff_ntaname", "dropoff_puma",
]
parse_dates=["pickup_datetime", "dropoff_datetime"]
df = pd.read_csv('https://modin-datasets.s3.amazonaws.com/trips_data.csv', names=columns_names,
header=None, parse_dates=parse_dates)
@shossain thanks for the follow up, it appears that Ray has separate objects for the client. I have opened an issue (ray-project/ray#14042) to track it.
@devin-petersohn: is there a combination of Ray and Modin versions that is sure to work? I am trying to load some data and run some simple analysis.
The issue is the Client API, which is new. If you create and connect to a Ray cluster without the Client, it will work.
@devin-petersohn: I tried to submit the following script to an existing cluster using ray submit
:
import ray
ray.init(address='auto', _redis_password='5241590000000000')
import modin.pandas as pd
columns_names = [
"trip_id", "vendor_id", "pickup_datetime", "dropoff_datetime", "store_and_fwd_flag",
"rate_code_id", "pickup_longitude", "pickup_latitude", "dropoff_longitude", "dropoff_latitude",
"passenger_count", "trip_distance", "fare_amount", "extra", "mta_tax", "tip_amount",
"tolls_amount", "ehail_fee", "improvement_surcharge", "total_amount", "payment_type",
"trip_type", "pickup", "dropoff", "cab_type", "precipitation", "snow_depth", "snowfall",
"max_temperature", "min_temperature", "average_wind_speed", "pickup_nyct2010_gid",
"pickup_ctlabel", "pickup_borocode", "pickup_boroname", "pickup_ct2010",
"pickup_boroct2010", "pickup_cdeligibil", "pickup_ntacode", "pickup_ntaname", "pickup_puma",
"dropoff_nyct2010_gid", "dropoff_ctlabel", "dropoff_borocode", "dropoff_boroname",
"dropoff_ct2010", "dropoff_boroct2010", "dropoff_cdeligibil", "dropoff_ntacode",
"dropoff_ntaname", "dropoff_puma",
]
df = pd.read_csv('https://modin-datasets.s3.amazonaws.com/trips_data.csv', names=columns_names)
def q1(df):
return df.groupby("cab_type")["cab_type"].count()
print(df) # Works fine
print(q1(df)) # Throws exception
But, I am getting the following exception:
(raylet) [2021-02-22 12:17:09,584 C 4100 4100] pull_manager.cc:100: Check failed: active_object_pull_requests_[obj_id].erase(request_it->first)
(raylet) [2021-02-22 12:17:09,584 E 4100 4100] logging.cc:435: *** Aborted at 1614025029 (unix time) try "date -d @1614025029" if you are using GNU date ***
(raylet) [2021-02-22 12:17:09,584 E 4100 4100] logging.cc:435: PC: @ 0x0 (unknown)
(raylet) [2021-02-22 12:17:09,593 E 4100 4100] logging.cc:435: *** SIGABRT (@0x3e800001004) received by PID 4100 (TID 0x7f3902812800) from PID 4100; stack trace: ***
(raylet) [2021-02-22 12:17:09,595 E 4100 4100] logging.cc:435: @ 0x556fe05a223f google::(anonymous namespace)::FailureSignalHandler()
(raylet) [2021-02-22 12:17:09,596 E 4100 4100] logging.cc:435: @ 0x7f3902d743c0 (unknown)
(raylet) [2021-02-22 12:17:09,596 E 4100 4100] logging.cc:435: @ 0x7f390285d18b gsignal
(raylet) [2021-02-22 12:17:09,596 E 4100 4100] logging.cc:435: @ 0x7f390283c859 abort
(raylet) [2021-02-22 12:17:09,599 E 4100 4100] logging.cc:435: @ 0x556fe0593615 ray::SpdLogMessage::Flush()
(raylet) [2021-02-22 12:17:09,601 E 4100 4100] logging.cc:435: @ 0x556fe059364d ray::RayLog::~RayLog()
(raylet) [2021-02-22 12:17:09,602 E 4100 4100] logging.cc:435: @ 0x556fe028df8d ray::PullManager::DeactivatePullBundleRequest()
(raylet) [2021-02-22 12:17:09,603 E 4100 4100] logging.cc:435: @ 0x556fe0290ed9 ray::PullManager::CancelPull()
(raylet) [2021-02-22 12:17:09,604 E 4100 4100] logging.cc:435: @ 0x556fe027e28a ray::ObjectManager::CancelPull()
(raylet) [2021-02-22 12:17:09,605 E 4100 4100] logging.cc:435: @ 0x556fe01d0b77 ray::raylet::DependencyManager::RemoveTaskDependencies()
(raylet) [2021-02-22 12:17:09,606 E 4100 4100] logging.cc:435: @ 0x556fe023afdd ray::raylet::ClusterTaskManager::DispatchScheduledTasksToWorkers()
(raylet) [2021-02-22 12:17:09,607 E 4100 4100] logging.cc:435: @ 0x556fe0209d2f ray::raylet::NodeManager::HandleWorkerAvailable()
(raylet) [2021-02-22 12:17:09,608 E 4100 4100] logging.cc:435: @ 0x556fe0209e30 ray::raylet::NodeManager::HandleWorkerAvailable()
(raylet) [2021-02-22 12:17:09,608 E 4100 4100] logging.cc:435: @ 0x556fe020a373 ray::raylet::NodeManager::ProcessAnnounceWorkerPortMessage()
(raylet) [2021-02-22 12:17:09,609 E 4100 4100] logging.cc:435: @ 0x556fe0226f1a ray::raylet::NodeManager::ProcessClientMessage()
(raylet) [2021-02-22 12:17:09,610 E 4100 4100] logging.cc:435: @ 0x556fe01852a1 _ZNSt17_Function_handlerIFvSt10shared_ptrIN3ray16ClientConnectionEElRKSt6vectorIhSaIhEEEZNS1_6raylet6Raylet12HandleAcceptERKN5boost6system10error_codeEEUlS3_lS8_E0_E9_M_invokeERKSt9_Any_dataOS3_OlS8_
(raylet) [2021-02-22 12:17:09,614 E 4100 4100] logging.cc:435: @ 0x556fe054da4e ray::ClientConnection::ProcessMessage()
(raylet) [2021-02-22 12:17:09,618 E 4100 4100] logging.cc:435: @ 0x556fe054aaec boost::asio::detail::reactive_socket_recv_op<>::do_complete()
(raylet) [2021-02-22 12:17:09,622 E 4100 4100] logging.cc:435: @ 0x556fe0910e41 boost::asio::detail::scheduler::do_run_one()
(raylet) [2021-02-22 12:17:09,624 E 4100 4100] logging.cc:435: @ 0x556fe09124e9 boost::asio::detail::scheduler::run()
(raylet) [2021-02-22 12:17:09,624 E 4100 4100] logging.cc:435: @ 0x556fe09149d7 boost::asio::io_context::run()
(raylet) [2021-02-22 12:17:09,627 E 4100 4100] logging.cc:435: @ 0x556fe0151572 main
(raylet) [2021-02-22 12:17:09,627 E 4100 4100] logging.cc:435: @ 0x7f390283e0b3 __libc_start_main
(raylet) [2021-02-22 12:17:09,629 E 4100 4100] logging.cc:435: @ 0x556fe0166665 (unknown)
@shossain What version of Ray are you running?
@simon-mo Have you seen this before?
Ray 2.0.0.dev0 installed from here: https://s3-us-west-2.amazonaws.com/ray-wheels/latest/ray-2.0.0.dev0-cp37-cp37m-manylinux2014_x86_64.whl
On Tue, Feb 23, 2021 at 6:30 AM Devin Petersohn notifications@github.com wrote:
@shossain https://github.com/shossain What version of Ray are you running?
@simon-mo https://github.com/simon-mo Have you seen this before?
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/modin-project/modin/issues/2688#issuecomment-784242353, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAGXPMHGYG4P5QVVSWVZPSTTAO3XZANCNFSM4XB5TRCA .
This is an issue currently on latest Ray wheels, being discussed here: https://github.com/ray-project/ray/issues/14279
I am not able to reproduce this on the latest master. This seems to be working now.
System information
modin.__version__
): master (0.8.3+22.ge99b629)Describe the problem
Modin fails to load csv from s3 with ray client and throws an error.
Source code / logs