Closed XavierGeerinck closed 1 year ago
this can be closed, main issue is that I was nesting Actors, which ray does not like! Solving this requires serializing the actors correctly by using reference IDs to the actors themselves in the __getstate__
,__setstate__
and __deepcopy__
methods of python. Simply using named actors, a wrapper class (hell yeah abstraction!) and some correct magic method implementation to fetch the actor on restore fixed it.
What happened + What you expected to happen
When I am using ray and I am trying to start a training task on a local ray environment, it starts up and initially stops with:
I'd love to go more into details to see what is going on, but I am unable to find any logging entries that can clarify why this is happening. The only part I receive is a long stacktrace which I included below.
Looking forward to some help on how I could debug this (and which logging I should maybe enable). I'm available to provide any more details
When running ray.init(configure_logging=True, logging_level=logging.DEBUG, log_to_driver=True)` to try to get more logs, I get
Versions / Dependencies
Ray: 2.5.0 Python: 3.8.16 OS: Ubuntu 22.04.1 (also happening on Mac latest OS)
Reproduction script
Issue Severity
High: It blocks me from completing my task.