Open allenyin55 opened 4 years ago
The problem is that when the task is executed locally, it doesn't create Fiber & set the current actor as async. The latter is easy to solve, but the first one makes the code path pretty messy because it requires core_worker to have FiberState class only for the local mode...
@ijrsvt I think I can make a fix by tonight, so you don't need to work on this.
Okay. I have been digging into this, and it is pretty tricky to fix because there's only one core worker for local mode. That says we cannot corrupt the core_worker state as async actor state. This requires some decent amount of refactoring (which I don't think it is worth taking time now). As you cannot use 0.8.5 until the next release anyway, I will postpone the fix to the next sprint and set the priority as P1.
@rkooo567 I can help with it next sprint as well. @allenyin55 What is your use case with using async tasks in local mode ?
@ijrsvt He should run the integration test with local mode, and his integration test contains an async actor.
@rkooo567 Is this for a new integration test or an existing one? I don't know if there are a ton of use cases where local_mode
and async actors will be used together. I'm not sure it fits in the definition of local_mode as emulating serial python?
I guess @allenyin55 can answer better for the question. But I believe it was a new one, and he said he should use local mode. (btw, it worked when he used 0.8.4, and idk how)
I don't know well about the purpose of local mode, but my impression is that it is the most useful when you want to reduce the test load (meaning mostly for unit / integration test). If so, I believe it should return the same output as non-local-mode for every API.
(Also, there could be easy fix without using Fiber that just came up to my head. We can probably talk about this offline if you think we should fix this issue).
I just did a bisection and this regression was introduced in https://github.com/ray-project/ray/pull/7670. We need to either fix it or give a better error message that async actors are not supported in local mode.
We use async actors in local mode for dependency injection during testing. Local mode makes sure that the test code runs in a single process, which allows us to mock certain methods in that process (which get called by Ray tasks).
The fix could be actually pretty simple if we assume these 2 cases for local mode.
In this case, we just need to check if the function is coroutine and run the event loop + coroutine in the main thread until it is done. @pcmoritz @ijrsvt do you guys think it is a valid premise for local mode?
@rkooo567 I think that is a great idea. It fits with the logic of local mode being serial python. It may be worth renaming it 'serial' mode to make its intended use case more obvious.
Downgrading to P2 since this is not a common use case.
I would argue with this not being a common use-case. Sure, you're not gonna run Ray in Local mode in production - but you may have to do it during development for debugging purposes. And since async actors are not supported in local mode - it means you simply cannot use them in your code. That is unless you're prepared to maintain two versions of your code - one with async actors for production, and one with sync actors for development... who would want to do that?
It would not have been an issue for me if not for debugging requirements. #14005 could be an alternative solution here (that would enable seamless debugging of workers/actors in PyCharm in non-local mode).
The same problem I am facing too.
As I am testing my actor. Earlier I had used ray.init()
unfortunately, this does not record test coverage. To test i changed to local_mode=True
. Now the problem is my actor's remote function call never returns, as the actor code runs infinitely :(
This feature would be greatly appreciated. I am trying to set up a WandbCallback to my tune pipeline but cannot make it work in debug (local) mode because the actor issue.
I wouldn't say that's a rare usecase, because you basically can't debug with pycharm if you have any async actor. Strange that it's not prioritized and is already 4 years old
What is the problem?
Async actor is not being recognized in local mode. cc: @ijrsvt
Ray version and other system information (Python version, TensorFlow version, OS):
Reproduction
The error I'm getting