Closed tsachiblauamat closed 3 years ago
Hi
tsachiblauamat, it's hard to tell without more details, but I would not be surprised if you are dealing with thread contention. The RF training is already multithreaded, and you can adjust the number of threads in the constructor. See https://github.com/tensorflow/decision-forests/issues/39#issuecomment-882519315 for more details.
Let me know if that doesn't answer your question!
Btw tsachiblauamat, what are you trying to run in "func" ? The training or the evaluation of a tensorflow RF ?
We've never tried what you are trying, but I know the underlying inference engine can run in a multi-thread system -- the TF Serving engine will issue parallel calls to the inference engine, and it just works, as far as we know.
But I'm not very familiar with the multiprocessing library. Reading through it, in Linux by default it uses the fork(2) system call, which, as I would expect: "Note that safely forking a multithreaded process is problematic." I wonder how this would play out with tensorflow subsystems...
Care to share more details ?
func just train and predict the output. when using multiprocessing it is not stable.
sometimes its crushes and sometimes I get his error
Traceback (most recent call last): File "python3.8/multiprocessing/util.py", line 224, in call res = self._callback(*self._args, **self._kwargs) File "python3.8/multiprocessing/pool.py", line 712, in _terminate_pool if p.exitcode is None: File "python3.8/multiprocessing/process.py", line 232, in exitcode return self._popen.poll() File "python3.8/multiprocessing/popen_fork.py", line 27, in poll pid, sts = os.waitpid(self.pid, flag) KeyboardInterrupt:
and sometimes its fine
Odd ... I'm not familiar with how multiprocess library work, but it's likely an interaction with it and how TF works.
But I'd suggest an alternative: have the train and evaluate run on completely separate python program, and have a "controller program" start them. Serialize results to disk, and read them from the "controller" program -- easier than dealing with pipes/signals (which seems where the multiprocessing library is failing).
Hi,
Note: As Jan mentioned, TF-DF training support multi-thread training. If you have a single large dataset, this is the best approach.
If you want to train multiple small models in parallel, you should be able to train different models in parallel in different threads / processes.
multiprocessing.Pool
Multi-processing in python can be kind of tricky. For simplicity and if possible, I would use multi-threading instead.
Here is a workable example that train and run 5 small models in parallel:
!pip install tensorflow_decision_forests -U -q
import tensorflow_decision_forests as tfdf
from multiprocessing.pool import ThreadPool
import numpy as np
print(tfdf.__version__)
def train_model(model_id):
x_train = np.random.uniform(size=(50, 1))
y_train = x_train[:, 0] >= 0.5
model = tfdf.keras.GradientBoostedTreesModel(num_trees=10)
model.fit(x=x_train, y=y_train)
return np.mean(model.predict(x_train))
# Train 5 models, and print the mean predicted value on the training dataset.
pool = ThreadPool(5)
print(pool.map(train_model, range(5)))
When running a few RF with multiprocessing(in parallel) its working. but when running a few RF with multiprocessing after RF its stuck. I'm running multiprocessing with the class multiprocessing by running the command:
in func I'm running tensorflow-RF
Any idea why this is happening?
Thanks, Tsachi