openml-labs / gama

An automated machine learning tool aimed to facilitate AutoML research.
https://openml-labs.github.io/gama/master/
Apache License 2.0
92 stars 29 forks source link

Issue with Pickling Local Object in Custom Gama Classifier Implementation #199

Closed simonprovost closed 1 year ago

simonprovost commented 1 year ago

Hello @PGijsbers,

I hope you are doing well :) I am reaching out in regards to the issue we previously discussed about developing a new AutoML system around GAMA as an outer wrapper (Issue #191).

I am excited to share that I've made substantial progress with my implementation. Specifically, I have been able to successfully design the creation of individuals in a sequential step manner, as we had discussed earlier. Moreover, I've managed to implement the population creation, and individual pipeline fitting using our custom pipeline (i.e, we create a population, took randomly one of the individual, called its fit function which basically process the pipeline of the individual).

However, I have run into an obstacle while attempting to call the fit function of our custom Gama classifier. Despite the fact that I have solved a few related issues along the road, I am currently unable to identify an error we encountered. The error appears to be related to preserving a local object, but I have no idea how to proceed / where to dig down. I believe this has something to do with Threading operations, but what can I do? The errors encountered:

 File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/multiprocessing/queues.py", line 244, in _feed
    obj = _ForkingPickler.dumps(obj)
  File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/multiprocessing/reduction.py", line 51, in dumps
    cls(buf, protocol).dump(obj)
AttributeError: Can't pickle local object 'EvaluationLibrary.__init__.<locals>.main_node_str'

For my current implementation, I'm using the Random Search base search estimator. My goal is to have an end-to-end process working before moving onto building a Bayesian optimisation search algorithm. During the Random Search process, I'm able to print the following:

# [...]/gama/search_methods/random_search.py
[...]
    with AsyncEvaluator() as async_:
        for individual in start_candidates:
            print("Evaluating start candidates")
            print(f"Individual: {individual}")
            print(f"operations.evaluate.func's name: {operations.evaluate.func.__name__}")
            async_.submit(operations.evaluate, individual) # Following that line I am lost where to look at (see next)
[...]

This results in outputs like the following:

> Evaluating start candidates

> Individual: Individual a471a748-8f09-4788-a7e6-104f1c1f93fb
Pipeline: ExtraTreesClassifier(DummyStep(data, DummyStep.dummy_parameter="some_value"), ExtraTreesClassifier.bootstrap=True, ExtraTreesClassifier.criterion='gini', ExtraTreesClassifier.max_features=0.6000000000000001, min_samples_leaf=17, min_samples_split=3, ExtraTreesClassifier.n_estimators=500)
Fitness: None

> operations.evaluate.func's name: evaluate_individual

However, after this, the aforementioned issue arises. Despite embedding print statements within the evaluate_individual (gama/genetic_programming/compilers/scikitlearn.py) method, none of the print statements even before any procedures been executed within the function are printed to the console, and here is the strange behaviour. While I am unable to uncover the root cause of this issue. I am currently seeking guidance on how to proceed if you have any single suggestion please :)

For your reference, I am using a forked version of Gama, integrated locally into my project using poetry. The build process is functioning as expected and I am executing this on a MacBook Pro M2 Ventura, stable version.

Any insights or suggestions you could provide would be deeply appreciated 👌 Thank you very much for your time and assistance! Have a lovely evening, Best wishes,

simonprovost commented 1 year ago

EDIT 1 (06 th June - 6:30 p.m): I have tried various search algorithms to ensure that the issue is not on the Random Search side. AsyncEA and ASHA both experience issues with the pickle object. Not identical object, but identical attribute error: "Can not pickle local object [X]" X being the object.

- AttributeError: Can't pickle local object 'AsyncEA.__init__.<locals>.get_parent'
- AttributeError: Can't pickle local object 'AsynchronousSuccessiveHalving.__init__.<locals>.<lambda>'
simonprovost commented 1 year ago

EDIT 2 (06 th June - 10:30 p.m): I have finally came around the idea that threading does not easily serialise local / self methods. Thus, all methods concerned by this pickle serialise/load issue _(actually two main_node_str and get_pipeline_str had to create a function for the pipeline lambda in the self.fields dict instantiation, in gama/utilities/evaluation_library.py and gama/logging/evaluation_logger.py, respectively)_ been moved out of their respective classes as static methods and it seems to work out.

  1. This feels like a quick patch; I am confident that it should function without these changes; did I miss anything? any thoughts?

Cheers

simonprovost commented 1 year ago

EDIT 3 (07 th June - 10 p.m): While the EDIT-2 fix appears to be moving in the direction of making work an end-to-end process. FYI: The current status is that the random search do the jobs particularly well, evaluating individiuals (using our custom scikit learn pipeline) etc without any issue anymore, which is very promising! However, the conclusion of the entire process, or also referred to the max eval time, is apparently experiencing some difficulty. There are two types of errors that occasionally occur.

From my opinion, it appears to be associated with Pickle and Threading again. Serialising and Deserialising. The errors listed below occur when the maximum evaluation time has been reached. I tried 60 and 600 seconds, but the results are always the same: difficult to terminate and progressing towards the post-processing phase and the conclusion of the fit function.

Nonetheless, It appears to have been functioning a few times towards the post-processing procedure, but we do not know why, how, or by what means. Probably some serendipity was involved in the threading procedure.

Therefore, I would appreciate your guidance in this regard, as it appears I am not that far away from a first end-to-end Auto-ML pipeline using GAMA!

Specifications:

Error-1:

INFO:gama.utilities.generic.async_evaluator:GAMA exceeded memory usage (0, 1).
INFO:gama.utilities.generic.async_evaluator:Terminating 74446 due to memory usage.
DEBUG:gama.utilities.generic.async_evaluator:Signaling 1 subprocesses to stop.
Traceback (most recent call last):
  File "path/to/random_search.py", line 69, in random_search
    future = operations.wait_next(async_)
  File "path/to/operator_set.py", line 65, in wait_next
    future = async_evaluator.wait_next()
  File "path/to/async_evaluator.py", line 240, in wait_next
    self._control_memory_usage()
  File "path/to/async_evaluator.py", line 317, in _control_memory_usage
    self._stop_worker_process(proc)
  File "path/to/async_evaluator.py", line 276, in _stop_worker_process
    process.wait(timeout=60)
  File "path/to/psutil/_psosx.py", line 346, in wrapper
    return fun(self, *args, **kwargs)
  File "path/to/psutil/_psosx.py", line 520, in wait
    return _psposix.wait_pid(self.pid, timeout, self._name)
  File "path/to/psutil/_psposix.py", line 137, in wait_pid
    interval = sleep(interval)
  File "path/to/psutil/_psposix.py", line 115, in sleep
    _sleep(interval)
stopit.utils.TimeoutException

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "path/to/main.py", line 76, in <module>
    main()
  File "path/to/main.py", line 41, in main
    customGAMAClassifier.fit(X_train, y_train)
  File "path/to/GamaCustomClassifier.py", line 244, in fit
    super().fit(x, y, *args, **kwargs)
  File "path/to/gama.py", line 609, in fit
    self._search_phase(warm_start, timeout=fit_time)
  File "path/to/gama.py", line 685, in _search_phase
    self._search_method.search(self._operator_set, start_candidates=pop)
  File "path/to/random_search.py", line 28, in search
    random_search(operations, self.output, start_candidates)
  File "path/to/random_search.py", line 60, in random_search
    with AsyncEvaluator() as async_:
  File "path/to/async_evaluator.py", line 159, in __exit__
    self.clear_queue(self._input)
  File "path/to/async_evaluator.py", line 188, in clear_queue
    q.get(timeout=0.1)
  File "path/to/queues.py", line 122, in get
    return _ForkingPickler.loads(res)
EOFError: Ran out of input

Error-2: (Appears to be the exact same with more information prior to the EOFERROR.

INFO:gama.utilities.generic.async_evaluator:GAMA exceeded memory usage (0, 1).
INFO:gama.utilities.generic.async_evaluator:Terminating 75965 due to memory usage.
INFO:gama.utilities.generic.async_evaluator:Starting new evaluations process.
INFO:gama.utilities.generic.async_evaluator:GAMA exceeded memory usage (1, 1).
INFO:gama.utilities.generic.async_evaluator:Terminating 76080 due to memory usage.
INFO:gama.utilities.generic.async_evaluator:Starting new evaluations process.
INFO:gama.utilities.generic.async_evaluator:GAMA exceeded memory usage (2, 1).
INFO:gama.utilities.generic.async_evaluator:Terminating 76083 due to memory usage.
INFO:gama.utilities.generic.async_evaluator:Starting new evaluations process.
INFO:gama.utilities.generic.async_evaluator:GAMA exceeded memory usage (3, 1).
INFO:gama.utilities.generic.async_evaluator:Terminating 76085 due to memory usage.
INFO:gama.utilities.generic.async_evaluator:Starting new evaluations process.
ERROR:root:Stopping daemon due to exception
Traceback (most recent call last):
  File "path/to//async_evaluator.py", line 380, in evaluator_daemon
    future = input_queue.get(block=False)
  File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/multiprocessing/queues.py", line 122, in get
    return _ForkingPickler.loads(res)
EOFError: Ran out of input
Traceback (most recent call last):
  File "path/to//async_evaluator.py", line 380, in evaluator_daemon
    future = input_queue.get(block=False)
  File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/multiprocessing/queues.py", line 122, in get
    return _ForkingPickler.loads(res)
EOFError: Ran out of input
ERROR:root:Stopping daemon due to exception
Traceback (most recent call last):
  File "path/to//async_evaluator.py", line 380, in evaluator_daemon
    future = input_queue.get(block=False)
  File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/multiprocessing/queues.py", line 122, in get
    return _ForkingPickler.loads(res)
_pickle.UnpicklingError: invalid load key, '\x00'.
Traceback (most recent call last):
  File "path/to//async_evaluator.py", line 380, in evaluator_daemon
    future = input_queue.get(block=False)
  File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/multiprocessing/queues.py", line 122, in get
    return _ForkingPickler.loads(res)
_pickle.UnpicklingError: invalid load key, '\x00'.
ERROR:root:Stopping daemon due to exception
Traceback (most recent call last):
  File "path/to//async_evaluator.py", line 380, in evaluator_daemon
    future = input_queue.get(block=False)
  File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/multiprocessing/queues.py", line 122, in get
    return _ForkingPickler.loads(res)
EOFError: Ran out of input
Traceback (most recent call last):
  File "path/to//async_evaluator.py", line 380, in evaluator_daemon
    future = input_queue.get(block=False)
  File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/multiprocessing/queues.py", line 122, in get
    return _ForkingPickler.loads(res)
EOFError: Ran out of input
ERROR:root:Stopping daemon due to exception
Traceback (most recent call last):
  File "path/to//async_evaluator.py", line 380, in evaluator_daemon
    future = input_queue.get(block=False)
  File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/multiprocessing/queues.py", line 122, in get
    return _ForkingPickler.loads(res)
_pickle.UnpicklingError: invalid load key, '\x00'.
Traceback (most recent call last):
  File "path/to//async_evaluator.py", line 380, in evaluator_daemon
    future = input_queue.get(block=False)
  File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/multiprocessing/queues.py", line 122, in get
    return _ForkingPickler.loads(res)
_pickle.UnpicklingError: invalid load key, '\x00'.
ERROR:root:Stopping daemon due to exception
Traceback (most recent call last):
  File "path/to//async_evaluator.py", line 380, in evaluator_daemon
    future = input_queue.get(block=False)
  File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/multiprocessing/queues.py", line 122, in get
    return _ForkingPickler.loads(res)
EOFError: Ran out of input
Traceback (most recent call last):
  File "path/to//async_evaluator.py", line 380, in evaluator_daemon
    future = input_queue.get(block=False)
  File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/multiprocessing/queues.py", line 122, in get
    return _ForkingPickler.loads(res)
EOFError: Ran out of input
ERROR:root:Stopping daemon due to exception
Traceback (most recent call last):
  File "path/to//async_evaluator.py", line 380, in evaluator_daemon
    future = input_queue.get(block=False)
  File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/multiprocessing/queues.py", line 122, in get
    return _ForkingPickler.loads(res)
_pickle.UnpicklingError: invalid load key, '\x00'.
Traceback (most recent call last):
  File "path/to//async_evaluator.py", line 380, in evaluator_daemon
    future = input_queue.get(block=False)
  File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/multiprocessing/queues.py", line 122, in get
    return _ForkingPickler.loads(res)
_pickle.UnpicklingError: invalid load key, '\x00'.
ERROR:root:Stopping daemon due to exception
Traceback (most recent call last):
  File "path/to//async_evaluator.py", line 380, in evaluator_daemon
    future = input_queue.get(block=False)
  File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/multiprocessing/queues.py", line 122, in get
    return _ForkingPickler.loads(res)
EOFError: Ran out of input
Traceback (most recent call last):
  File "path/to//async_evaluator.py", line 380, in evaluator_daemon
    future = input_queue.get(block=False)
  File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/multiprocessing/queues.py", line 122, in get
    return _ForkingPickler.loads(res)
EOFError: Ran out of input
ERROR:root:Stopping daemon due to exception
Traceback (most recent call last):
  File "path/to//async_evaluator.py", line 380, in evaluator_daemon
    future = input_queue.get(block=False)
  File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/multiprocessing/queues.py", line 122, in get
    return _ForkingPickler.loads(res)
_pickle.UnpicklingError: invalid load key, '\x00'.
Traceback (most recent call last):
  File "path/to//async_evaluator.py", line 380, in evaluator_daemon
    future = input_queue.get(block=False)
  File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/multiprocessing/queues.py", line 122, in get
    return _ForkingPickler.loads(res)
_pickle.UnpicklingError: invalid load key, '\x00'.
ERROR:root:Stopping daemon due to exception
Traceback (most recent call last):
  File "path/to//async_evaluator.py", line 380, in evaluator_daemon
    future = input_queue.get(block=False)
  File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/multiprocessing/queues.py", line 122, in get
    return _ForkingPickler.loads(res)
EOFError: Ran out of input
Traceback (most recent call last):
  File "path/to//async_evaluator.py", line 380, in evaluator_daemon
    future = input_queue.get(block=False)
  File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/multiprocessing/queues.py", line 122, in get
    return _ForkingPickler.loads(res)
EOFError: Ran out of input
ERROR:root:Stopping daemon due to exception
Traceback (most recent call last):
  File "path/to//async_evaluator.py", line 380, in evaluator_daemon
    future = input_queue.get(block=False)
  File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/multiprocessing/queues.py", line 122, in get
    return _ForkingPickler.loads(res)
_pickle.UnpicklingError: invalid load key, '\x00'.
Traceback (most recent call last):
  File "path/to//async_evaluator.py", line 380, in evaluator_daemon
    future = input_queue.get(block=False)
  File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/multiprocessing/queues.py", line 122, in get
    return _ForkingPickler.loads(res)
_pickle.UnpicklingError: invalid load key, '\x00'.
ERROR:root:Stopping daemon due to exception
Traceback (most recent call last):
  File "path/to//async_evaluator.py", line 380, in evaluator_daemon
    future = input_queue.get(block=False)
  File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/multiprocessing/queues.py", line 122, in get
    return _ForkingPickler.loads(res)
EOFError: Ran out of input
Traceback (most recent call last):
  File "path/to//async_evaluator.py", line 380, in evaluator_daemon
    future = input_queue.get(block=False)
  File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/multiprocessing/queues.py", line 122, in get
    return _ForkingPickler.loads(res)
EOFError: Ran out of input
ERROR:root:Stopping daemon due to exception
Traceback (most recent call last):
  File "path/to//async_evaluator.py", line 380, in evaluator_daemon
    future = input_queue.get(block=False)
  File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/multiprocessing/queues.py", line 122, in get
    return _ForkingPickler.loads(res)
_pickle.UnpicklingError: invalid load key, '\x00'.
Traceback (most recent call last):
  File "path/to//async_evaluator.py", line 380, in evaluator_daemon
    future = input_queue.get(block=False)
  File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/multiprocessing/queues.py", line 122, in get
    return _ForkingPickler.loads(res)
_pickle.UnpicklingError: invalid load key, '\x00'.
ERROR:root:Stopping daemon due to exception
Traceback (most recent call last):
  File "path/to//async_evaluator.py", line 380, in evaluator_daemon
    future = input_queue.get(block=False)
  File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/multiprocessing/queues.py", line 122, in get
    return _ForkingPickler.loads(res)
EOFError: Ran out of input
Traceback (most recent call last):
  File "path/to//async_evaluator.py", line 380, in evaluator_daemon
    future = input_queue.get(block=False)
  File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/multiprocessing/queues.py", line 122, in get
    return _ForkingPickler.loads(res)
EOFError: Ran out of input
ERROR:root:Stopping daemon due to exception
Traceback (most recent call last):
  File "path/to//async_evaluator.py", line 380, in evaluator_daemon
    future = input_queue.get(block=False)
  File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/multiprocessing/queues.py", line 122, in get
    return _ForkingPickler.loads(res)
_pickle.UnpicklingError: invalid load key, '\x00'.
Traceback (most recent call last):
  File "path/to//async_evaluator.py", line 380, in evaluator_daemon
    future = input_queue.get(block=False)
  File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/multiprocessing/queues.py", line 122, in get
    return _ForkingPickler.loads(res)
_pickle.UnpicklingError: invalid load key, '\x00'.
ERROR:root:Stopping daemon due to exception
Traceback (most recent call last):
  File "path/to//async_evaluator.py", line 380, in evaluator_daemon
    future = input_queue.get(block=False)
  File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/multiprocessing/queues.py", line 122, in get
    return _ForkingPickler.loads(res)
EOFError: Ran out of input
Traceback (most recent call last):
  File "path/to//async_evaluator.py", line 380, in evaluator_daemon
    future = input_queue.get(block=False)
  File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/multiprocessing/queues.py", line 122, in get
    return _ForkingPickler.loads(res)
EOFError: Ran out of input
ERROR:root:Stopping daemon due to exception
Traceback (most recent call last):
  File "path/to//async_evaluator.py", line 380, in evaluator_daemon
    future = input_queue.get(block=False)
  File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/multiprocessing/queues.py", line 122, in get
    return _ForkingPickler.loads(res)
_pickle.UnpicklingError: invalid load key, '\x00'.
Traceback (most recent call last):
  File "path/to//async_evaluator.py", line 380, in evaluator_daemon
    future = input_queue.get(block=False)
  File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/multiprocessing/queues.py", line 122, in get
    return _ForkingPickler.loads(res)
_pickle.UnpicklingError: invalid load key, '\x00'.
ERROR:root:Stopping daemon due to exception
Traceback (most recent call last):
  File "path/to//async_evaluator.py", line 380, in evaluator_daemon
    future = input_queue.get(block=False)
  File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/multiprocessing/queues.py", line 122, in get
    return _ForkingPickler.loads(res)
EOFError: Ran out of input
Traceback (most recent call last):
  File "path/to//async_evaluator.py", line 380, in evaluator_daemon
    future = input_queue.get(block=False)
  File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/multiprocessing/queues.py", line 122, in get
    return _ForkingPickler.loads(res)
EOFError: Ran out of input
ERROR:root:Stopping daemon due to exception
Traceback (most recent call last):
  File "path/to//async_evaluator.py", line 380, in evaluator_daemon
    future = input_queue.get(block=False)
  File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/multiprocessing/queues.py", line 122, in get
    return _ForkingPickler.loads(res)
_pickle.UnpicklingError: invalid load key, '\x00'.
Traceback (most recent call last):
  File "path/to//async_evaluator.py", line 380, in evaluator_daemon
    future = input_queue.get(block=False)
  File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/multiprocessing/queues.py", line 122, in get
    return _ForkingPickler.loads(res)
_pickle.UnpicklingError: invalid load key, '\x00'.
ERROR:root:Stopping daemon due to exception
Traceback (most recent call last):
  File "path/to//async_evaluator.py", line 380, in evaluator_daemon
    future = input_queue.get(block=False)
  File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/multiprocessing/queues.py", line 122, in get
    return _ForkingPickler.loads(res)
EOFError: Ran out of input
Traceback (most recent call last):
  File "path/to//async_evaluator.py", line 380, in evaluator_daemon
    future = input_queue.get(block=False)
  File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/multiprocessing/queues.py", line 122, in get
    return _ForkingPickler.loads(res)
EOFError: Ran out of input
DEBUG:gama.utilities.generic.async_evaluator:Signaling 0 subprocesses to stop.
ERROR:root:Stopping daemon due to exception
Traceback (most recent call last):
  File "path/to//async_evaluator.py", line 380, in evaluator_daemon
    future = input_queue.get(block=False)
  File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/multiprocessing/queues.py", line 122, in get
    return _ForkingPickler.loads(res)
_pickle.UnpicklingError: invalid load key, '\x00'.
Traceback (most recent call last):
  File "path/to//async_evaluator.py", line 380, in evaluator_daemon
    future = input_queue.get(block=False)
  File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/multiprocessing/queues.py", line 122, in get
    return _ForkingPickler.loads(res)
_pickle.UnpicklingError: invalid load key, '\x00'.
Traceback (most recent call last):
  File "path/to//async_evaluator.py", line 343, in _get_memory_usage
    yield process, process.memory_info()[0] / (2**20)
  File "path/to/psutil/_common.py", line 480, in wrapper
    raise raise_from(err, None)
  File "<string>", line 3, in raise_from
  File "path/to/psutil/_common.py", line 478, in wrapper
    return fun(self)
  File "path/to/psutil/__init__.py", line 1063, in memory_info
    return self._proc.memory_info()
  File "path/to/psutil/_psosx.py", line 346, in wrapper
    return fun(self, *args, **kwargs)
  File "path/to/psutil/_psosx.py", line 446, in memory_info
    rawtuple = self._get_pidtaskinfo()
  File "path/to/psutil/_psosx.py", line 349, in wrapper
    raise ZombieProcess(self.pid, self._name, self._ppid)
psutil.ZombieProcess: PID still exists but it's a zombie (pid=76127)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "path/to/gama/search_methods/random_search.py", line 69, in random_search
    future = operations.wait_next(async_)
  File "path/to/gama/genetic_programming/operator_set.py", line 65, in wait_next
    future = async_evaluator.wait_next()
  File "path/to//async_evaluator.py", line 240, in wait_next
    self._control_memory_usage()
  File "path/to//async_evaluator.py", line 302, in _control_memory_usage
    mem_proc = list(self._get_memory_usage())
  File "path/to//async_evaluator.py", line 347, in _get_memory_usage
    self._start_worker_process()
  File "path/to//async_evaluator.py", line 266, in _start_worker_process
    mp_process.start()
  File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/multiprocessing/process.py", line 121, in start
    self._popen = self._Popen(self)
  File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/multiprocessing/context.py", line 224, in _Popen
    return _default_context.get_context().Process._Popen(process_obj)
  File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/multiprocessing/context.py", line 288, in _Popen
    return Popen(process_obj)
  File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/multiprocessing/popen_spawn_posix.py", line 32, in __init__
    super().__init__(process_obj)
  File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/multiprocessing/popen_fork.py", line 19, in __init__
    self._launch(process_obj)
  File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/multiprocessing/popen_spawn_posix.py", line 62, in _launch
    f.write(fp.getbuffer())
stopit.utils.TimeoutException

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "path/to/main.py", line 76, in <module>
    main()
  File "path/to/main.py", line 41, in main
    customGAMAClassifier.fit(X_train, y_train)
  File "path/to/gama/GamaCustomClassifier.py", line 244, in fit
    super().fit(x, y, *args, **kwargs)
  File "path/to/gama/gama.py", line 609, in fit
    self._search_phase(warm_start, timeout=fit_time)
  File "path/to/gama/gama.py", line 685, in _search_phase
    self._search_method.search(self._operator_set, start_candidates=pop)
  File "path/to/gama/search_methods/random_search.py", line 28, in search
    random_search(operations, self.output, start_candidates)
  File "path/to/gama/search_methods/random_search.py", line 60, in random_search
    with AsyncEvaluator() as async_:
  File "path/to//async_evaluator.py", line 159, in __exit__
    self.clear_queue(self._input)
  File "path/to//async_evaluator.py", line 188, in clear_queue
    q.get(timeout=0.1)
  File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/multiprocessing/queues.py", line 122, in get
    return _ForkingPickler.loads(res)
EOFError: Ran out of input
simonprovost commented 1 year ago

EDIT-4 (7th June, 11 p.m): After multiple attempts, I was able to complete the entire fit procedure after the maximum evaluation time had expired. Nonetheless, what is surprising is that it has the same console log issues as EDIT-3, but still provides access to the post-processing procedure to conclude the entire fit process - EDIT-4. I do not know, perhaps it is a side effect or something else. If you know where I could conduct more digging, I would be extremely grateful:


INFO:gama.utilities.generic.async_evaluator:GAMA exceeded memory usage (0, 1).
INFO:gama.utilities.generic.async_evaluator:Terminating 81857 due to memory usage.
An error occurred in function fit: 
TimeoutException - pipeline evaluation timed out.
Pipeline evaluation interrupted by timeout exception- returnning the error in  cascade.
ERROR:root:Stopping daemon due to exception
Traceback (most recent call last):
  File "<poetry_env_path>/lib/python3.10/site-packages/gama/utilities/generic/async_evaluator.py", line 381, in evaluator_daemon
    future = input_queue.get(block=False)
  File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/multiprocessing/queues.py", line 122, in get
    return _ForkingPickler.loads(res)
_pickle.UnpicklingError: invalid load key, '\x00'.
Traceback (most recent call last):
  File "<poetry_env_path>/lib/python3.10/site-packages/gama/utilities/generic/async_evaluator.py", line 381, in evaluator_daemon
    future = input_queue.get(block=False)
  File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/multiprocessing/queues.py", line 122, in get
    return _ForkingPickler.loads(res)
_pickle.UnpicklingError: invalid load key, '\x00'.

INFO:gama.genetic_programming.operations:Longitudinal random expression creation complete with DecisionTreeClassifier(DummyStep(data), DecisionTreeClassifier.max_depth=7, min_samples_leaf=1, min_samples_split=8)

ERROR:root:Stopping daemon due to exception
Traceback (most recent call last):
  File "<poetry_env_path>/lib/python3.10/site-packages/gama/utilities/generic/async_evaluator.py", line 381, in evaluator_daemon
    future = input_queue.get(block=False)
  File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/multiprocessing/queues.py", line 122, in get
    return _ForkingPickler.loads(res)
EOFError: Ran out of input
Traceback (most recent call last):
  File "<poetry_env_path>/lib/python3.10/site-packages/gama/utilities/generic/async_evaluator.py", line 381, in evaluator_daemon
    future = input_queue.get(block=False)
  File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/multiprocessing/queues.py", line 122, in get
    return _ForkingPickler.loads(res)
EOFError: Ran out of input
ERROR:root:Stopping daemon due to exception
Traceback (most recent call last):
  File "<poetry_env_path>/lib/python3.10/site-packages/gama/utilities/generic/async_evaluator.py", line 381, in evaluator_daemon
    future = input_queue.get(block=False)
  File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/multiprocessing/queues.py", line 122, in get
    return _ForkingPickler.loads(res)
_pickle.UnpicklingError: invalid load key, '\x00'.
Traceback (most recent call last):
  File "<poetry_env_path>/lib/python3.10/site-packages/gama/utilities/generic/async_evaluator.py", line 381, in evaluator_daemon
    future = input_queue.get(block=False)
  File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/multiprocessing/queues.py", line 122, in get
    return _ForkingPickler.loads(res)
_pickle.UnpicklingError: invalid load key, '\x00'.
ERROR:root:Stopping daemon due to exception
Traceback (most recent call last):
  File "<poetry_env_path>/lib/python3.10/site-packages/gama/utilities/generic/async_evaluator.py", line 381, in evaluator_daemon
    future = input_queue.get(block=False)
  File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/multiprocessing/queues.py", line 122, in get
    return _ForkingPickler.loads(res)
EOFError: Ran out of input
Traceback (most recent call last):
  File "<poetry_env_path>/lib/python3.10/site-packages/gama/utilities/generic/async_evaluator.py", line 381, in evaluator_daemon
    future = input_queue.get(block=False)
  File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/multiprocessing/queues.py", line 122, in get
    return _ForkingPickler.loads(res)
EOFError: Ran out of input
ERROR:root:Stopping daemon due to exception
Traceback (most recent call last):
  File "<poetry_env_path>/lib/python3.10/site-packages/gama/utilities/generic/async_evaluator.py", line 381, in evaluator_daemon
    future = input_queue.get(block=False)
  File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/multiprocessing/queues.py", line 122, in get
    return _ForkingPickler.loads(res)
EOFError: Ran out of input
Traceback (most recent call last):
  File "<poetry_env_path>/lib/python3.10/site-packages/gama/utilities/generic/async_evaluator.py", line 381, in evaluator_daemon
    future = input_queue.get(block=False)
  File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/multiprocessing/queues.py", line 122, in get
    return _ForkingPickler.loads(res)
EOFError: Ran out of input
ERROR:root:Stopping daemon due to exception
Traceback (most recent call last):
  File "<poetry_env_path>/lib/python3.10/site-packages/gama/utilities/generic/async_evaluator.py", line 381, in evaluator_daemon
    future = input_queue.get(block=False)
  File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/multiprocessing/queues.py", line 122, in get
    return _ForkingPickler.loads(res)
EOFError: Ran out of input
Traceback (most recent call last):
  File "<poetry_env_path>/lib/python3.10/site-packages/gama/utilities/generic/async_evaluator.py", line 381, in evaluator_daemon
    future = input_queue.get(block=False)
  File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/multiprocessing/queues.py", line 122, in get
    return _ForkingPickler.loads(res)
EOFError: Ran out of input
ERROR:root:Stopping daemon due to exception
Traceback (most recent call last):
  File "<poetry_env_path>/lib/python3.10/site-packages/gama/utilities/generic/async_evaluator.py", line 381, in evaluator_daemon
    future = input_queue.get(block=False)
  File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/multiprocessing/queues.py", line 122, in get
    return _ForkingPickler.loads(res)
EOFError: Ran out of input
Traceback (most recent call last):
  File "<poetry_env_path>/lib/python3.10/site-packages/gama/utilities/generic/async_evaluator.py", line 381, in evaluator_daemon
    future = input_queue.get(block=False)
  File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/multiprocessing/queues.py", line 122, in get
    return _ForkingPickler.loads(res)
EOFError: Ran out of input
ERROR:root:Stopping daemon due to exception
Traceback (most recent call last):
  File "<poetry_env_path>/lib/python3.10/site-packages/gama/utilities/generic/async_evaluator.py", line 381, in evaluator_daemon
    future = input_queue.get(block=False)
  File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/multiprocessing/queues.py", line 122, in get
    return _ForkingPickler.loads(res)
EOFError: Ran out of input
Traceback (most recent call last):
  File "<poetry_env_path>/lib/python3.10/site-packages/gama/utilities/generic/async_evaluator.py", line 381, in evaluator_daemon
    future = input_queue.get(block=False)
  File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/multiprocessing/queues.py", line 122, in get
    return _ForkingPickler.loads(res)
EOFError: Ran out of input

DEBUG:gama.utilities.generic.async_evaluator:Signaling 1 subprocesses to stop.
INFO:gama.gama:Search phase evaluated 19 individuals.
INFO:gama.utilities.generic.timekeeper:STOP: search RandomSearch after 53.9941s.
INFO:gama.utilities.generic.timekeeper:START: postprocess BestFitPostProcessing
INFO:gama.gama:Best pipeline: <not displaying for the sake of brevity>
INFO:gama.utilities.generic.timekeeper:STOP: postprocess BestFitPostProcessing after 0.0549s.
simonprovost commented 1 year ago

EDIT-5 (8th June, 12 a.m): The final edit of the day. I discovered by chance that we can configure the maximum number of random search evaluations to do. While still investigating why the overall fit pipeline does not complete once the maximum eval time has been reached, I began a run with the maximum number of random search evaluations set to a low value in the hope that it would run more quickly and enable me to investigate more quickly. Nonetheless, using the max evaluations with a low number, i.e. to avoid the max eval time timeout, it appears to run an end to end pipeline; I can even invoke the predict proba of the Gama classifier to obtain the classification report, confusion matrix, etc. Which is very reassuring, I will not fib 🤥 However, I would love the process to work without using the max evaluations parameter;

I just wanted to mention that in case it is necessary to comprehend my situation.

By max evaluations I refer to:

def random_search(
    operations: OperatorSet,
    output: List[Individual],
    start_candidates: List[Individual],
    max_evaluations: Optional[int] = None,
) -> List[Individual]:
[...]
PGijsbers commented 1 year ago

This feels like a quick patch; I am confident that it should function without these changes; did I miss anything? any thoughts?

To the best of my knowledge, this is how it has to be. Pickle tries to re-import all the functions/classes etc., and these local functions are not importable.

I am not sure where the EOF errors come from, but:

INFO:gama.gama:Best pipeline: <not displaying for the sake of brevity>

Are your pipelines very long? The errors did not happen before you made changes, correct? Maybe it is possible to try and isolate changes bit by bit to see when the error gets introduced.

simonprovost commented 1 year ago

Thanks for your response @PGijsbers !

To the best of my knowledge, this is how it has to be. Pickle tries to re-import all the functions/classes etc., and these local functions are not importable.

Why has not that been completed on the branch I diverged from the main? When GAMA is packaged, does this issue no longer appear? Interesting, anyway good that I did it the right way.

I am not sure where the EOF errors come from, but: Are your pipelines very long? The errors did not happen before you made changes, correct? Maybe it is possible to try and isolate changes bit by bit to see when the error gets introduced.

Our pipelines are between two and three steps in length, which is what I'd considered not long. In a traditional system like yours, two for us is equivalent to one for you, and three for us is equivalent to your two-steps length. This is essentially an additional, light-weight step that we take.

The errors did not happen before you made changes, correct?

I should test whether this occurs on the main branch, which I have not yet done, and this was my final test for the today to pinpoint if this occurs correctly in the GAMA code or elsewhere. Will notify you in an hour.

Maybe it is possible to try and isolate changes bit by bit to see when the error gets introduced

I genuinely have no idea what or where to change. I suppose I have exhausted every option. I mean now my design works with max evaluations to random search set to less than max eval time, I have not much to do on that end. Yet, am also quite hesitant about how max eval time works given that it is multithreaded. Any lead on that, so that I can try printing statements everywhere to potential get more insights, would be greatly appreciated.

In the meantime, do not you find it odd that with max evaluations set to random search, the procedure completes successfully but with max eval time it does not properly?

Cheers

simonprovost commented 1 year ago

Re,

It appears that creating a (condo) new environment, installing GAMA with pip, and executing the example from the official readme are error-free. Therefore, the issues seems to be coming from any of my implementation but weird I got not lead on via the logs / nor any of my tonnes of print statements.

Will investigate further, but as quick thought, I believe that saving/loeading my custom estimators is probably the challenging part, although no console logs indicate this, but will look into that. If you have any insight into how the maximum evaluation time works/route, I would greatly appreciate it.

Cheers :)

simonprovost commented 1 year ago

@PGijsbers Heya. The issue does not appear to be directly related to GAMA. No need for you to respond further on the wire EOFERROR because I will explain myself about it within the day. However, I would appreciate an explanation for why these lambdas are not static in the main branch? Why did I need to move them out myself? Quite curious.

Cheers.

PGijsbers commented 1 year ago

As far as I am aware, they normally do not get pickled. Only when you try to pickle the GAMA object itself would this be an issue. That said, after some more digging I did find that I actually worked on this myself before (sorry for not remembering earlier :/): https://github.com/openml-labs/gama/compare/master...pickle

I don't know why I didn't merge it at the time though, perhaps I too found it a bit of a "hack" and wanted to investigate if there were other ways first.

simonprovost commented 1 year ago

I don't know why I didn't merge it at the time though, perhaps I too found it a bit of a "hack" and wanted to investigate if there were other ways first.

100% understandable. However, based on my observations around some Github issues and Stack. Posts, this is Pickle's choice, and it is possible that this is one of your only option if you want to stay with Pickle. Nonetheless, allowing the function to stay within the class and decorating it with @staticmethod would at the very least make it more elegant, but does it work, I do not know, to try maybe. Yet, this does not exclude the issue with in-line lambda.

For in-line lambda, I still have seen this cloud pickling fork, which I cite cloudpickle supports pickling for lambda functions along with functions and classes defined interactively in the __main__ module. - CloudPickle. Could be worth investigating. However, for my design now it do works as-is (i.e, by moving out of the classes necessary functions/lambdas), so will continue like that for now.

In the interim, this issue will be resolved in the next comment I make for newcomers experiencing the same problems as I expressed initially.

simonprovost commented 1 year ago

Newcomers, this is the answer to the initial question

Thanks to @PGijsbers, we realised that sometimes pickling a GAMA object necessitates the extraction of lambda in-line/class methods from the class itself in order for the Pickle process to correctly handle its saving and loading. This can occur in a variety of situations, but the one I initially mentioned was when I asked for models to be saved locally on my system. The second was, in multi-threaded processing, I believe pickles are saved and loaded on the other side of the river. Consequently, making a few modifications, i.e. Extracting out the class the problematic method from its class, is the solution ✅

On the other hand, the majority of EOFErrors also aforementioned were mainly caused by side effects I reckon! In the meantime, I believe that insufficient memory allocation during a brief (30 seconds) period of system execution may also be one of the cause. Since resolving the pickling error (extracting out in-line/in-class methods) and recompiling GAMA correctly, as well as increasing the parameters of memory mb and max total time, I am now able to operate error-free.

If you encounter any further issues in this regard, please reopen the issue and inform us.

Cheers

Exemples of functions/lambda to move out of its class:

Before:

class EvaluationLibrary:
        ...
        def main_node_str(e: Evaluation): # internal to the class
            return str(e.individual.main_node)

        self._lookup_key = main_node_str

After:

def main_node_str(e: Evaluation): # external to the class
    return str(e.individual.main_node)

class EvaluationLibrary:
        ...
        self._lookup_key = main_node_str

For more information as @PGijsbers last answer. Refer to master...pickle