tidyverse / multidplyr

A dplyr backend that partitions a data frame over multiple processes
https://multidplyr.tidyverse.org
Other
642 stars 74 forks source link

Assigning "workers" to different cores on Debian K8s #115

Closed isaac-florence closed 3 years ago

isaac-florence commented 3 years ago

I am using multidplyr on a 16 core node in an OpenShift (kubernetes) cluster. I am trying to use only 6 cores and I assign the number of cores when setting up a multidplyr cluster accordingly.

However when viewing the utilisation metrics, it is clear than only three cores are being used. What is the most robust way to start each "worker" R session on different cores, rather than allowing this default use of running two sessions on each of the three utilised cores?

Should this even be done in multidplyr or should it be done in the image?

Many thanks for your help

hadley commented 3 years ago

This is handled by the operating system — multidplyr starts up the R processes, and then the OS assigns them to cores.

isaac-florence commented 3 years ago

Thank you!