xorbitsai / xoscar

Python actor framework for heterogeneous computing.
https://xoscar.dev
Apache License 2.0
89 stars 21 forks source link

BUG: Fix route problem when listen on 0.0.0.0 #101

Closed frostyplanet closed 3 weeks ago

frostyplanet commented 1 month ago

Fix route problem when both service listening 0.0.0.0:port calling each other

Two service listen on 0.0.0.0:1234, on different hosts:

xoscar.actor_ref(111.111.111.111:1234) will return unexpected LocalActorRef.

log print in context.actor_ref(111.111.111.111:1234)

    actor_ref ActorRef(uid=b'supervisor', address='111.111.111.111:1234')
    _call 111.111.111.111:1234
    get client 111.111.111.111:1234
    got LocalActorRef(uid=None, address='0.0.0.0:1234'), actor_weakref=<weakref at 0x75745bcdecf0; to 'CloudSupervisorActor' at 0x75745d3ed260>
    fix_all_zero_ip()
    got LocalActorRef(uid=None, address='111.111.111.111:1234'), actor_weakref=<weakref at 0x75745bcdecf0; to 'CloudSupervisorActor' at 0x75745d3ed260>

using the returned LocalActorRef, method call intend for remote service actually sent to local service.

The solution is simple, during pool initialization, do not register_local_pool if address is all zero (because it's a widcast)

基于此前的改动 https://github.com/xorbitsai/xoscar/pull/92 发现了一个场景中的问题:

场景是两个服务,都 listen 0.0.0.0 但是在不同主机上同一个端口, 当写一个函数从 hostA.method 中去调用 hostB.method, 发现实际行为是 hostA.method 调用了 hostA.method,出现死循环。是由于从 hostA 去构建 actor_ref(hostB) 的时候返回了LocalActorRef。调查发现和 pool 初始化的行为有关。 解决办法是: 通配地址 0.0.0.0 不应该认为是本地地址,正常走 socket 通信就可以了.

codecov[bot] commented 1 month ago

Codecov Report

All modified and coverable lines are covered by tests :white_check_mark:

Project coverage is 85.10%. Comparing base (d6465c9) to head (bb150cd). Report is 11 commits behind head on main.

Additional details and impacted files ```diff @@ Coverage Diff @@ ## main #101 +/- ## ========================================== - Coverage 88.97% 85.10% -3.88% ========================================== Files 48 54 +6 Lines 4038 4559 +521 Branches 770 834 +64 ========================================== + Hits 3593 3880 +287 - Misses 358 584 +226 - Partials 87 95 +8 ``` | [Flag](https://app.codecov.io/gh/xorbitsai/xoscar/pull/101/flags?src=pr&el=flags&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=xorbitsai) | Coverage Δ | | |---|---|---| | [unittests](https://app.codecov.io/gh/xorbitsai/xoscar/pull/101/flags?src=pr&el=flag&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=xorbitsai) | `84.93% <ø> (-3.88%)` | :arrow_down: | Flags with carried forward coverage won't be shown. [Click here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=xorbitsai#carryforward-flags-in-the-pull-request-comment) to find out more.

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

qinxuye commented 4 weeks ago

CI is fixed, could you rebase?

frostyplanet commented 3 weeks ago

@qinxuye all ci passed