Ray is a unified framework for scaling AI and Python applications. Ray consists of a core distributed runtime and a set of AI Libraries for accelerating ML workloads.
[2024-03-04 01:18:44,055 W 156 235] (gcs_server) gcs_task_manager.cc:286: Max number of tasks event (100000) allowed is reached. Old task events will be overwritten. Set RAY_task_events_max_num_task_in_gcs to a higher value to store more.
[2024-03-04 01:18:48,602 C 156 235] (gcs_server) gcs_task_manager.cc:116: Check failed: idx_itr != task_attemptindex.end() Task attempt of task: NIL_ID, attempt_number: 0 should have task events in the buffer but missing.
StackTrace Information
/usr/local/lib/python3.9/dist-packages/ray/core/src/ray/gcs/gcs_server(+0x9a70aa) [0x56089868b0aa] ray::operator<<()
/usr/local/lib/python3.9/dist-packages/ray/core/src/ray/gcs/gcs_server(+0x9a8b82) [0x56089868cb82] ray::SpdLogMessage::Flush()
/usr/local/lib/python3.9/dist-packages/ray/core/src/ray/gcs/gcs_server(+0x9a8e97) [0x56089868ce97] ray::RayLog::~RayLog()
/usr/local/lib/python3.9/dist-packages/ray/core/src/ray/gcs/gcs_server(+0x296e27) [0x560897f7ae27] ray::gcs::GcsTaskManager::GcsTaskManagerStorage::GetTaskEvent()
/usr/local/lib/python3.9/dist-packages/ray/core/src/ray/gcs/gcs_server(+0x296ecf) [0x560897f7aecf] ray::gcs::GcsTaskManager::GcsTaskManagerStorage::MarkTaskAttemptFailed()
/usr/local/lib/python3.9/dist-packages/ray/core/src/ray/gcs/gcs_server(+0x29762e) [0x560897f7b62e] ray::gcs::GcsTaskManager::GcsTaskManagerStorage::MarkTasksFailed()
/usr/local/lib/python3.9/dist-packages/ray/core/src/ray/gcs/gcs_server(+0x29790a) [0x560897f7b90a] boost::asio::detail::wait_handler<>::do_complete()
/usr/local/lib/python3.9/dist-packages/ray/core/src/ray/gcs/gcs_server(+0xa9926b) [0x56089877d26b] boost::asio::detail::scheduler::do_run_one()
/usr/local/lib/python3.9/dist-packages/ray/core/src/ray/gcs/gcs_server(+0xa9a501) [0x56089877e501] boost::asio::detail::scheduler::run()
/usr/local/lib/python3.9/dist-packages/ray/core/src/ray/gcs/gcs_server(+0xa9a770) [0x56089877e770] boost::asio::io_context::run()
/usr/local/lib/python3.9/dist-packages/ray/core/src/ray/gcs/gcs_server(+0x1f8c8e) [0x560897edcc8e] std::thread::_State_impl<>::_M_run()
/usr/local/lib/python3.9/dist-packages/ray/core/src/ray/gcs/gcs_server(+0xafadb0) [0x5608987dedb0] execute_native_thread_routine
/lib/x86_64-linux-gnu/libpthread.so.0(+0x7ea7) [0x7f355d15fea7] start_thread
/lib/x86_64-linux-gnu/libc.so.6(clone+0x3f) [0x7f355cd48a2f] __clone
Versions / Dependencies
Ray2.3.1
Reproduction script
Hard to reproduce.
Issue Severity
Medium: It is a significant difficulty but I can work around it.
What happened + What you expected to happen
[2024-03-04 01:18:44,055 W 156 235] (gcs_server) gcs_task_manager.cc:286: Max number of tasks event (100000) allowed is reached. Old task events will be overwritten. Set
RAY_task_events_max_num_task_in_gcs
to a higher value to store more. [2024-03-04 01:18:48,602 C 156 235] (gcs_server) gcs_task_manager.cc:116: Check failed: idx_itr != task_attemptindex.end() Task attempt of task: NIL_ID, attempt_number: 0 should have task events in the buffer but missing. StackTrace Information /usr/local/lib/python3.9/dist-packages/ray/core/src/ray/gcs/gcs_server(+0x9a70aa) [0x56089868b0aa] ray::operator<<() /usr/local/lib/python3.9/dist-packages/ray/core/src/ray/gcs/gcs_server(+0x9a8b82) [0x56089868cb82] ray::SpdLogMessage::Flush() /usr/local/lib/python3.9/dist-packages/ray/core/src/ray/gcs/gcs_server(+0x9a8e97) [0x56089868ce97] ray::RayLog::~RayLog() /usr/local/lib/python3.9/dist-packages/ray/core/src/ray/gcs/gcs_server(+0x296e27) [0x560897f7ae27] ray::gcs::GcsTaskManager::GcsTaskManagerStorage::GetTaskEvent() /usr/local/lib/python3.9/dist-packages/ray/core/src/ray/gcs/gcs_server(+0x296ecf) [0x560897f7aecf] ray::gcs::GcsTaskManager::GcsTaskManagerStorage::MarkTaskAttemptFailed() /usr/local/lib/python3.9/dist-packages/ray/core/src/ray/gcs/gcs_server(+0x29762e) [0x560897f7b62e] ray::gcs::GcsTaskManager::GcsTaskManagerStorage::MarkTasksFailed() /usr/local/lib/python3.9/dist-packages/ray/core/src/ray/gcs/gcs_server(+0x29790a) [0x560897f7b90a] boost::asio::detail::wait_handler<>::do_complete() /usr/local/lib/python3.9/dist-packages/ray/core/src/ray/gcs/gcs_server(+0xa9926b) [0x56089877d26b] boost::asio::detail::scheduler::do_run_one() /usr/local/lib/python3.9/dist-packages/ray/core/src/ray/gcs/gcs_server(+0xa9a501) [0x56089877e501] boost::asio::detail::scheduler::run() /usr/local/lib/python3.9/dist-packages/ray/core/src/ray/gcs/gcs_server(+0xa9a770) [0x56089877e770] boost::asio::io_context::run() /usr/local/lib/python3.9/dist-packages/ray/core/src/ray/gcs/gcs_server(+0x1f8c8e) [0x560897edcc8e] std::thread::_State_impl<>::_M_run() /usr/local/lib/python3.9/dist-packages/ray/core/src/ray/gcs/gcs_server(+0xafadb0) [0x5608987dedb0] execute_native_thread_routine /lib/x86_64-linux-gnu/libpthread.so.0(+0x7ea7) [0x7f355d15fea7] start_thread /lib/x86_64-linux-gnu/libc.so.6(clone+0x3f) [0x7f355cd48a2f] __cloneVersions / Dependencies
Ray2.3.1
Reproduction script
Hard to reproduce.
Issue Severity
Medium: It is a significant difficulty but I can work around it.