Open fmohr opened 3 years ago
Thinking more about this, I believe that there is really no solution to this problem except of spawning a new process. The problem with new processes is though that one needs to block a good deal of memory for each of them to avoid problems. This can easily become a total waste of resources.
Probably the best solution is to introduce an option that allows to run ML-Plan in process mode if there is an anticipated risk of memory overflows.
Then, more generally, it would be cool to add the opportunity to the process project of AILibs that allows to execute objects that implement both Callable<T extends Serializable>
and Serializable
in a separate process with specific resource limitations. One could then have a general executor for such operations that serializes the object to be executed and launches a new JVM with a general executor that unserializes the object, calls it, and serializes the T
into some output file, which can then again be unserialized by the original process.
Observing this error when running MLPlan in cluster experiments:
Logs show that this stack trace is immediately followed by an indication of memory overflow:
One dataset where this occured was the DNA dataset (https://www.openml.org/d/40670) using 24G memory.
The following message directly preceding the exception suggests that the error occurred when training a BayesNet:
The question is really whether this can be avoided without spawning external processes.