Closed tdudgeon closed 3 years ago
The re-configuration of the memory for the v100.small flavour has now been done (now set to 75GB) so these GPU VMs should now be re-created.
These were rebuilt on 10 March using names of pulsar-exec-node-cuda-1.xchem One day later they are still all running OK.
The /root/update-hosts.sh
file on the pulsar-central-manager
had to be edited to reflect the new names so that the gpu gender was set correctly.
The v100.small GPU nodes have a problem in that they request too much memory, causing them to fail. STFC are planning on changing the configuration to request slightly less memory which should fix the problem. Once this change has been made we will need to re-create the GPU nodes to pick up the changes. Until this is done the GPU nodes will be unstable,