Whenever the GPU worker starts training, it immediately crashes with no error. Instead, all I get is
2022-07-07 19:05:41,609 WARNING worker.py:1404 -- A worker died or was killed
while executing a task by an unexpected system error. To troubleshoot the
problem, check the logs for the dead worker.
RayTask ID: ffffffffffffffff2aeefb9774b8f9463ffdfd8101000000
Worker ID: 21602417afb8b58af6db10cb511242afac87db0eb5b09f5606320616
Node ID: 51e55bca411e9e811bf5c67089cbf9867f5f8374c2fce4a8370c987c
Worker IP address: *
Worker port: *
Worker PID: 6218
I grepped through all the logs and stdout, but can't find any information about what the error was, or where it occurred.
Whenever the GPU worker starts training, it immediately crashes with no error. Instead, all I get is
I grepped through all the logs and stdout, but can't find any information about what the error was, or where it occurred.