Open shangw-nvidia opened 3 years ago
Right now multiprocessing only runs on single node.
However it's probably possible to extend it to support multi nodes. Indeed we're using the multiprocess
library from the pathos
project to do multiprocessing in datasets
, and pathos
is made to support parallelism on several nodes. More info about pathos on the pathos repo.
If you're familiar with pathos or if you want to give it a try, it could be a nice addition to the library :)
Curious to hear if anything on that side changed or if you suggestions to do it changed @lhoestq :)
For our use-case, we are entering the regime where trading a few more instances to save a few days would be nice :)
Currently for multi-node setups we're mostly going towards a nice integration with Dask. But I wouldn't exclude exploring pathos
more at one point
Hi,
Currently, multiprocessing can be enabled for the
.map()
stages on a single node. However, in the case of multi-node training, (since more than one node would be available) I'm wondering if it's possible to extend the parallel processing among nodes, instead of only 1 node running the.map()
while the other node is waiting for it to finish?Thanks!