mars-project / mars

Mars is a tensor-based unified framework for large-scale data computation which scales numpy, pandas, scikit-learn and Python functions.
https://mars-project.readthedocs.io
Apache License 2.0
2.68k stars 325 forks source link

[BUG] mars.new_ray_session(backend="ray") hangs at exit #3280

Closed fyrestone closed 1 year ago

fyrestone commented 1 year ago

Describe the bug A clear and concise description of what the bug is.

import mars
import mars.dataframe as md
import numpy as np
import pandas as pd

mars.new_ray_session(backend="ray", default=True)

s = np.random.RandomState(0)
raw = pd.DataFrame(s.rand(100, 4), columns=list("abcd"))
df = md.DataFrame(raw, chunk_size=30)

r = df.describe().execute()
print(r)

Hang stack:

Thread 0x104B10580 (idle)
    wait (threading.py:302)
    wait (threading.py:558)
    start (threading.py:857)
    start (mars/core/entity/executable.py:38)
    put (mars/core/entity/executable.py:71)
    cb (mars/core/entity/executable.py:94)
Thread 0x16D96F000 (idle)
    _run (mars/lib/aio/isolation.py:36)
    run (threading.py:870)
    _bootstrap_inner (threading.py:932)
    _bootstrap (threading.py:890)

The hang at exit is because after execution, the new_ray_session does not stop isolation, then the tileable object is gc, it tries to creates a new DecrefThread. But, this is the exiting stage, creating a new thread will be hangs at thread.start().

To Reproduce To help us reproducing this bug, please provide information below:

  1. Your Python version
  2. The version of Mars you use
  3. Versions of crucial packages, such as numpy, scipy and pandas
  4. Full stack of the error.
  5. Minimized code to reproduce the error.

Expected behavior A clear and concise description of what you expected to happen.

Additional context Add any other context about the problem here.