intel-analytics / analytics-zoo

Distributed Tensorflow, Keras and PyTorch on Apache Spark/Flink & Ray
https://analytics-zoo.readthedocs.io/
Apache License 2.0
13 stars 3 forks source link

Autots failed to run with large data size #361

Open dding3 opened 3 years ago

dding3 commented 3 years ago

Tried to use autots to train a large data size, https://ai.baidu.com/broad/download?dataset=traffic (traffic_speed_sub-dataset), there is exception:

File "search.py", line 62, in recipe=recipe) File "/root/anaconda3/envs/ding37/lib/python3.7/site-packages/zoo/automl/search/ray_tune_search_engine.py", line 158, in compile numpy_format=True File "/root/anaconda3/envs/ding37/lib/python3.7/site-packages/zoo/automl/search/ray_tune_search_engine.py", line 318, in _prepare_train_func input_data_id = ray.put(input_data) File "/root/anaconda3/envs/ding37/lib/python3.7/site-packages/ray/_private/client_mode_hook.py", line 47, in wrapper return func(*args, **kwargs) File "/root/anaconda3/envs/ding37/lib/python3.7/site-packages/ray/worker.py", line 1489, in put object_ref = worker.put_object(value) File "/root/anaconda3/envs/ding37/lib/python3.7/site-packages/ray/worker.py", line 278, in put_object serialized_value, object_ref=object_ref)) File "python/ray/_raylet.pyx", line 994, in ray._raylet.CoreWorker.put_serialized_object File "python/ray/_raylet.pyx", line 918, in ray._raylet.CoreWorker._create_put_buffer File "python/ray/_raylet.pyx", line 145, in ray._raylet.check_status ray.exceptions.ObjectStoreFullError: Failed to put object ffffffffffffffffffffffffffffffffffffffff0100000002000000 in object store because it is full. Object size is 52792928994 bytes. The local object store is full of objects that are still in scope and cannot be evicted. Tip: Use the ray memory command to list active objects in the cluster.

sgwhat commented 1 year ago

Hi! May I ask if this issue has been fixed? 😄