ray-project / ray

Ray is a unified framework for scaling AI and Python applications. Ray consists of a core distributed runtime and a set of AI Libraries for accelerating ML workloads.
https://ray.io
Apache License 2.0
33.44k stars 5.67k forks source link

Objects being pinned in memory and unable to be evicted #15568

Closed michaelarman closed 3 years ago

michaelarman commented 3 years ago

I have created a custom NSGA2 algorithm and I'm using ray for my evaluator. I've noticed that the objects I retrieve from the ray task are being pinned and I've tried doing a couple things such as copying the returned objects and deleting the reference to the original, using garbage collector, or just deleting all references to ray objects after they're used but no luck.

When using ray memory I get this output many times since the GA is iterating: image The function is called in multiple areas:

1)

class MapEvaluator:
      def __init__(...):
      def evaluate_all(...):
        ....
        results = []
        for i in range(len(chromosomes)):
            trial_idx = int(n_gen*population_size) + i
            offspring[i].trial_idx = trial_idx
            results.append(get_genes.remote(self.predictor,chromosomes[i], trial_idx, n_gen))

        chromosomes,evaluation,updates_pgScanCaseRate,cache = zip(*ray.get([result for result in results])) # this is line 511 from the above output
       ....
      return offspring # or return copy.copy(offspring)

Then it does some work and returns an object that references some of the objects retrieved from ray. I've tried making a copy of the object that needs to be returned but it doesn't seem to work.

2&3)

class NSGAII:
    def __init__(...):
    ....
    def iterate(...):
        ....
         offspring = self.evaluate_all(parents) # calls function below # line 134 from ray memory output

        offspring.extend(self.population)

    def evaluate_all(...):
         for parent in parents:
             offspring.extend(self.evaluator.evaluate_all(parent,n_gen,self.variator,self.breeding,self.population_size)) # line 159 from ray memory output
             #self.evaluator.evaluate_all calls the function from 1) from the evaluator class 

The main problem is that I am running the GA sequentially (i.e. doing multiple runs one after the other) E.g.

for problem in problems:
    run_optimization()

and the objects from the previous run are being pinned. I've read the documentation on memory management and they mention many ways objects can be pinned but not really any remedies for them.

rkooo567 commented 3 years ago

Is it possible to share code and the whole ray memory output?