modin-project / modin

Modin: Scale your Pandas workflows by changing a single line of code
http://modin.readthedocs.io
Apache License 2.0
9.74k stars 651 forks source link

BUG: when ray is shutdown and re initialized, modin methods dont work anymore, throwing exception #7378

Open Liquidmasl opened 2 weeks ago

Liquidmasl commented 2 weeks ago

Modin version checks

Reproducible Example

import ray
import modin.pandas as pd

ray.init()

df = pd.DataFrame({'A': [1, 2, 3], 'B': [4, 5, 6]})
df.min()

ray.shutdown()
ray.init()

df = pd.DataFrame({'A': [1, 2, 3], 'B': [4, 5, 6]})
df.min()

Issue Description

This code will lead to

ValueError: An application is trying to access a Ray object whose owner is unknown(00ffffffffffffffffffffffffffffffffffffff0100000005e1f505). Please make sure that all Ray objects you are trying to access are part of the current Ray session. Note that object IDs generated randomly (ObjectID.from_random()) or out-of-band (ObjectID.from_binary(...)) cannot be passed as a task argument because Ray does not know which task created them. If this was not how your object ID was generated, please file an issue at https://github.com/ray-project/ray/issues/

It seams to method is initialised as actor in the first ray session, and is not remade on the second call.

When I do NOT reinitialize ray in between, RAM slowly fills up until the process dies. Something is leaking, or some references to ray objects are never dropped. So this is the only solution I found. Sadly it leads to this issue.

This is not isolated to .min() but to all of the methods (?)

Expected Behavior

the error should not appear. Modin should recreate nescessary actors in a new ray session

Error Logs

```python-traceback Traceback (most recent call last): File "C:\Users\MarcelWinklmueller\AppData\Roaming\JetBrains\PyCharm2024.2\scratches\scratch_4.py", line 20, in df.min() File "C:\Users\MarcelWinklmueller\anaconda3\envs\cirqular_mono_repo\Lib\site-packages\modin\logging\logger_decorator.py", line 144, in run_and_log return obj(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^ File "C:\Users\MarcelWinklmueller\anaconda3\envs\cirqular_mono_repo\Lib\site-packages\modin\pandas\base.py", line 2129, in min data._query_compiler.min( File "C:\Users\MarcelWinklmueller\anaconda3\envs\cirqular_mono_repo\Lib\site-packages\modin\logging\logger_decorator.py", line 144, in run_and_log return obj(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^ File "C:\Users\MarcelWinklmueller\anaconda3\envs\cirqular_mono_repo\Lib\site-packages\modin\core\storage_formats\pandas\query_compiler.py", line 901, in min return TreeReduce.register(map_func, reduce_func)(self, axis=axis, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\MarcelWinklmueller\anaconda3\envs\cirqular_mono_repo\Lib\site-packages\modin\core\dataframe\algebra\tree_reduce.py", line 74, in caller query_compiler._modin_frame.tree_reduce( File "C:\Users\MarcelWinklmueller\anaconda3\envs\cirqular_mono_repo\Lib\site-packages\modin\logging\logger_decorator.py", line 144, in run_and_log return obj(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^ File "C:\Users\MarcelWinklmueller\anaconda3\envs\cirqular_mono_repo\Lib\site-packages\modin\core\dataframe\pandas\dataframe\utils.py", line 753, in run_f_on_minimally_updated_metadata result = f(self, *args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\MarcelWinklmueller\anaconda3\envs\cirqular_mono_repo\Lib\site-packages\modin\core\dataframe\pandas\dataframe\dataframe.py", line 2218, in tree_reduce reduce_parts = self._partition_mgr_cls.map_axis_partitions( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\MarcelWinklmueller\anaconda3\envs\cirqular_mono_repo\Lib\site-packages\modin\core\execution\modin_aqp.py", line 165, in magic result_parts = f(*args, **kwargs) ^^^^^^^^^^^^^^^^^^ File "C:\Users\MarcelWinklmueller\anaconda3\envs\cirqular_mono_repo\Lib\site-packages\modin\logging\logger_decorator.py", line 144, in run_and_log return obj(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^ File "C:\Users\MarcelWinklmueller\anaconda3\envs\cirqular_mono_repo\Lib\site-packages\modin\core\dataframe\pandas\partitioning\partition_manager.py", line 813, in map_axis_partitions return cls.broadcast_axis_partitions( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\MarcelWinklmueller\anaconda3\envs\cirqular_mono_repo\Lib\site-packages\modin\logging\logger_decorator.py", line 144, in run_and_log return obj(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^ File "C:\Users\MarcelWinklmueller\anaconda3\envs\cirqular_mono_repo\Lib\site-packages\modin\core\dataframe\pandas\partitioning\partition_manager.py", line 73, in wait result = func(cls, *args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\MarcelWinklmueller\anaconda3\envs\cirqular_mono_repo\Lib\site-packages\modin\core\dataframe\pandas\partitioning\partition_manager.py", line 591, in broadcast_axis_partitions [ File "C:\Users\MarcelWinklmueller\anaconda3\envs\cirqular_mono_repo\Lib\site-packages\modin\core\dataframe\pandas\partitioning\partition_manager.py", line 592, in left_partitions[i].apply( File "C:\Users\MarcelWinklmueller\anaconda3\envs\cirqular_mono_repo\Lib\site-packages\modin\logging\logger_decorator.py", line 144, in run_and_log return obj(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^ File "C:\Users\MarcelWinklmueller\anaconda3\envs\cirqular_mono_repo\Lib\site-packages\modin\core\dataframe\pandas\partitioning\axis_partition.py", line 288, in apply self.deploy_axis_func( File "C:\Users\MarcelWinklmueller\anaconda3\envs\cirqular_mono_repo\Lib\site-packages\modin\logging\logger_decorator.py", line 144, in run_and_log return obj(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^ File "C:\Users\MarcelWinklmueller\anaconda3\envs\cirqular_mono_repo\Lib\site-packages\modin\core\execution\ray\implementations\pandas_on_ray\partitioning\virtual_partition.py", line 180, in deploy_axis_func return _deploy_ray_func.options( ^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\MarcelWinklmueller\anaconda3\envs\cirqular_mono_repo\Lib\site-packages\ray\remote_function.py", line 250, in remote return func_cls._remote(args=args, kwargs=kwargs, **updated_options) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\MarcelWinklmueller\anaconda3\envs\cirqular_mono_repo\Lib\site-packages\ray\_private\auto_init_hook.py", line 21, in auto_init_wrapper return fn(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^ File "C:\Users\MarcelWinklmueller\anaconda3\envs\cirqular_mono_repo\Lib\site-packages\ray\util\tracing\tracing_helper.py", line 310, in _invocation_remote_span return method(self, args, kwargs, *_args, **_kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\MarcelWinklmueller\anaconda3\envs\cirqular_mono_repo\Lib\site-packages\ray\remote_function.py", line 468, in _remote return invocation(args, kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\MarcelWinklmueller\anaconda3\envs\cirqular_mono_repo\Lib\site-packages\ray\remote_function.py", line 435, in invocation object_refs = worker.core_worker.submit_task( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "python\ray\_raylet.pyx", line 4039, in ray._raylet.CoreWorker.submit_task File "python\ray\_raylet.pyx", line 4043, in ray._raylet.CoreWorker.submit_task File "python\ray\_raylet.pyx", line 835, in ray._raylet.prepare_args_and_increment_put_refs File "python\ray\_raylet.pyx", line 826, in ray._raylet.prepare_args_and_increment_put_refs File "python\ray\_raylet.pyx", line 866, in ray._raylet.prepare_args_internal File "python\ray\includes/common.pxi", line 87, in ray._raylet.check_status ValueError: An application is trying to access a Ray object whose owner is unknown(00ffffffffffffffffffffffffffffffffffffff0100000005e1f505). Please make sure that all Ray objects you are trying to access are part of the current Ray session. Note that object IDs generated randomly (ObjectID.from_random()) or out-of-band (ObjectID.from_binary(...)) cannot be passed as a task argument because Ray does not know which task created them. If this was not how your object ID was generated, please file an issue at https://github.com/ray-project/ray/issues/ ```

Installed Versions

INSTALLED VERSIONS ------------------ commit : c8bbca8e4e00c681370e3736b2f73bb0352408c3 python : 3.11.8.final.0 python-bits : 64 OS : Windows OS-release : 10 Version : 10.0.22631 machine : AMD64 processor : Intel64 Family 6 Model 186 Stepping 2, GenuineIntel byteorder : little LC_ALL : None LANG : None LOCALE : English_Austria.1252 Modin dependencies ------------------ modin : 0.31.0 ray : 2.34.0 dask : 2024.7.1 distributed : 2024.7.1 pandas dependencies ------------------- pandas : 2.2.2 numpy : 1.26.4 pytz : 2024.1 dateutil : 2.9.0.post0 setuptools : 68.2.2 pip : 24.2 Cython : 0.29.37 pytest : 8.2.0 hypothesis : None sphinx : None blosc : None feather : None xlsxwriter : None lxml.etree : None html5lib : None pymysql : None psycopg2 : None jinja2 : 3.1.3 IPython : 8.23.0 pandas_datareader : None adbc-driver-postgresql: None adbc-driver-sqlite : None bs4 : 4.12.3 bottleneck : None dataframe-api-compat : None fastparquet : None fsspec : 2024.3.1 gcsfs : None matplotlib : 3.8.2 numba : 0.60.0 numexpr : None odfpy : None openpyxl : None pandas_gbq : None pyarrow : 15.0.2 pyreadstat : None python-calamine : None pyxlsb : None s3fs : None scipy : 1.13.0 sqlalchemy : 2.0.29 tables : None tabulate : None xarray : None xlrd : None zstandard : None tzdata : 2024.1 qtpy : None pyqt5 : None
Liquidmasl commented 2 weeks ago

Doest seam to be exactlly it, all the objects have their own unique object id as they should. but something still isnt right.

in worker.py:785 the object id is fetched (cant debug deeper) with the owner adress being Null

one step up in worker.put() ray gets the global_worker so maybe thats wrong somehow...??

YarShev commented 2 weeks ago

Hi @Liquidmasl, try to call reload_modin() method in between ray.shutdown() and ray.init(). This works for me.

import ray
import modin.pandas as pd
from modin.utils import reload_modin

ray.init()

df = pd.DataFrame({'A': [1, 2, 3], 'B': [4, 5, 6]})
df.min()

ray.shutdown()
reload_modin()
ray.init()

df = pd.DataFrame({'A': [1, 2, 3], 'B': [4, 5, 6]})
df.min()
Liquidmasl commented 1 week ago

Hi @Liquidmasl, try to call reload_modin() method in between ray.shutdown() and ray.init(). This works for me.

Wow I did not find this anywhere. When googling now I also just find this thread and https://github.com/modin-project/modin/releases?q=reload_modin&expanded=true haha

Will try right now

Liquidmasl commented 1 week ago

call reload_modin() method in between ray.shutdown() and ray.init()

yes this seams to work, but it also seams to break @mutlimethods which is a bummer...

Apparently multimethod is not as extensively used as i thought