Open WeCcRy opened 2 weeks ago
Could you try distributed training tutorial from Tensorflow? https://www.tensorflow.org/guide/keras/distributed_training
I tried but failed. XPINNs seems different from other DL projects.
I have not used Tensorflow with multiple GPUs on a machine.
However, I did use multiprocessing with Tensorflow on Linux by adding following:
multiprocessing.set_start_method("spawn", force=True)
I do not if that would help to solve your problem. Note: I have migrated to Pytorch for many reasons. I also recommend to try Jax for PINN or work that requires differentiation.
i wonder how to put it on a machine with multi-GPU to accelerate its training?