ray-project / ray

Ray is an AI compute engine. Ray consists of a core distributed runtime and a set of AI Libraries for accelerating ML workloads.
https://ray.io
Apache License 2.0
33.51k stars 5.69k forks source link

Managing memory during long loops #6717

Open dstamos opened 4 years ago

dstamos commented 4 years ago

Ray version: 0.8.0 I am using a machine that has 220GB of RAM and is running Ubuntu 16.04.4.

I would like to run the following script and keep the memory footprint to the very minimum since all I need is the latest 'new_results'. When I look at htop it appears that the memory usage keeps increasing at each iteration. Why is that? Is there an appropriate setting for the ray.init() parameters that would prevent this? Is there another workaround?

I have considered ray.shutdown() at the end of each iteration but this seems to be unstable/unreliable. I might create a separate post about this later.

import ray
import numpy as np
import psutil

if ray.is_initialized() is False:
    driver_object_store_memory_cap = int(100 * 1024 * 1024)
    object_store_memory_cap = int(1000 * 1024 * 1024)
    memory_cap = int(500 * 1024 * 1024)
    redis_max_memory_cap = int(100 * 1024 * 1024)
    ray.init(memory=memory_cap,
             driver_object_store_memory=driver_object_store_memory_cap,
             object_store_memory=object_store_memory_cap,
             redis_max_memory=redis_max_memory_cap)

@ray.remote
def func():
    return np.random.randn(1000, 500)

for i in range(10000):
    tt = time.time()
    jobs = [func.remote() for idx in range(100)]
    new_results = ray.get(jobs)

    process = psutil.Process(os.getpid())
    print('memory: %d | time: %5.2f: ' % (process.memory_info().rss, time.time() - tt))
boweima920601 commented 4 years ago

I also found the memory footprint seems to be constantly increasing. My temporary solution for this is to set ray.remote(max_calls=1) in the function decorator, and thus force ray to open a new worker to do the task each time.

But of course, certain overhead is included each time ray starts a new process worker.

rkooo567 commented 4 years ago

Could you use the latest wheel or a version 0.8.3 (it will be released by next week). This might resolve this problem.