ulissigroup / finetuna

Active Learning for Machine Learning Potentials
MIT License
45 stars 11 forks source link

Ensemble training in parallel needs copying of trainer #2

Closed ruiqic closed 3 years ago

ruiqic commented 3 years ago

An initialized trainer should be copied for each ensemble.

mattaadams commented 3 years ago
Traceback (most recent call last):
  File "offline_al_ensemble_example.py", line 126, in <module>
    learner.learn()
  File "/home/matta/clones/al_mlp/al_mlp/offline_active_learner.py", line 92, in learn
    self.do_train()
  File "/home/matta/clones/al_mlp/al_mlp/preset_learners/ensemble_learner.py", line 63, in do_train
    self.make_ensemble()
  File "/home/matta/clones/al_mlp/al_mlp/preset_learners/ensemble_learner.py", line 111, in make_ensemble
    trained_calcs = pool.map(self.ensemble_train_trainer, self.ensemble_sets)
  File "/home/matta/miniconda3/envs/amptorch/lib/python3.6/multiprocessing/pool.py", line 266, in map
    return self._map_async(func, iterable, mapstar, chunksize).get()
  File "/home/matta/miniconda3/envs/amptorch/lib/python3.6/multiprocessing/pool.py", line 644, in get
    raise self._value
multiprocessing.pool.MaybeEncodingError: Error sending result: '[<amptorch.trainer.AtomsTrainer object at 0x7f283964feb8>]'. Reason: 'TypeError("can't pickle _cffi_backend.__CDataOwn objects",)'
ruiqic commented 3 years ago

can't pickle _cffi_backend.__CDataOwn objects Maybe you didn't build cffi? Try with python setup.py develop in amptorch.

zulissi commented 3 years ago

These come from trying to copy.deepcopy an atoms object with an amptorch trainer attached. Closing this.