threefoldtech / home

Starting point for the threefoldtech organization
https://threefold.io
Apache License 2.0
9 stars 4 forks source link

Improvements to GPU support #1564

Open scottyeager opened 1 month ago

scottyeager commented 1 month ago

While GPU support is present and working on the network today, the way it is implemented presents serious hurdles for anyone trying to rent and utilize a GPU.

To summarize the problems:

The result is that even if with great effort you can find a decent GPU in some node, there's a good chance that the node can't be reserved because it already has a workload. Indeed in some recent tests that was exactly the outcome. Of the handful of decent cards available, most were unused and unable to be used due to existing workloads on the nodes.

Improvements the UI/UX can be made of course, but it's no good if at the end of the day you can't rent the node to get at the GPU. There are a couple of potential approaches:

  1. Decouple the use of GPU from the renting of a dedicated node. Allow the user to rent the GPU specifically and attach it to a VM of the size they choose
  2. Somehow block deployments from going to the nodes with GPUs, so that they remain available to be reserved for GPU workloads

Both of these approaches have downsides. There may also be technical concerns that I haven't considered. So far despite searching for and reading any issue I can find on how we brought the GPU support live, I have not been able to find a clear description of the reasoning for the current approach.

Mik-TF commented 1 month ago

I think this is a fair assessment of the situation.

Also, we removed the dedicated node section, but that section provided better filters to find GPU (by brand, model, etc.). Maybe this could also be considered. We could take the best of the dedicated page features and implement it into the node section.