xorbitsai / xoscar

Python actor framework for heterogeneous computing.
https://xoscar.dev
Apache License 2.0
89 stars 21 forks source link

TST: test_copy_to_file_objects sometimes failed in CI #59

Closed qinxuye closed 1 year ago

qinxuye commented 1 year ago

Describe the bug

test_copy_to_file_objects sometimes failed in CI.

To Reproduce

To help us to reproduce this bug, please provide information below:

  1. Your Python version
  2. The version of Xoscar you use
  3. Versions of crucial packages, such as numpy, scipy and pandas
  4. Full stack of the error.
  5. Minimized code to reproduce the error.
__________________________ test_copy_to_file_objects ___________________________

    @pytest.mark.asyncio
    async def test_copy_to_file_objects():
        start_method = (
            os.environ.get("POOL_START_METHOD", "forkserver")
            if sys.platform != "win32"
            else None
        )
        pool = await create_actor_pool(
            "127.0.0.1",
            pool_cls=MainActorPool,
            n_process=2,
            subprocess_start_method=start_method,
        )

        d = tempfile.mkdtemp()
        async with pool:
            ctx = get_context()

            # actor on main pool
            actor_ref1 = await ctx.create_actor(
                FileobjTransferActor,
                uid="test-1",
                address=pool.external_address,
                allocate_strategy=ProcessIndex(1),
            )
            actor_ref2 = await ctx.create_actor(
                FileobjTransferActor,
                uid="test-2",
                address=pool.external_address,
                allocate_strategy=ProcessIndex(2),
            )
            sizes = [10 * 1024**2, 3 * 1024**2, 0.5 * 1024**2, 0.25 * 1024**2]
            names = []
            for _ in range(2 * len(sizes)):
                _, p = tempfile.mkstemp(dir=d)
                names.append(p)

>           await actor_ref1.copy_data(actor_ref2, names[::2], names[1::2], sizes=sizes)

xoscar/backends/test/tests/test_transfer.py:293: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
xoscar/backends/context.py:227: in send
    return self._process_result_message(result)
xoscar/backends/context.py:102: in _process_result_message
    raise message.as_instanceof_cause()
xoscar/backends/pool.py:657: in send
    result = await self._run_coro(message.message_id, coro)
xoscar/backends/pool.py:368: in _run_coro
    return await coro
xoscar/api.py:306: in __on_receive__
    return await super().__on_receive__(message)  # type: ignore
xoscar/core.pyx:527: in __on_receive__
    raise ex
xoscar/core.pyx:497: in xoscar.core._BaseActor.__on_receive__
    async with self._lock:
xoscar/core.pyx:498: in xoscar.core._BaseActor.__on_receive__
    with debug_async_timeout('actor_lock_timeout',
xoscar/core.pyx:503: in xoscar.core._BaseActor.__on_receive__
    result = await result
xoscar/backends/test/tests/test_transfer.py:239: in copy_data
    fobj.write(np.random.bytes(size))
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

>   ???
E   TypeError: [address=127.0.0.1:45079, pid=6288] 'float' object cannot be interpreted as an integer