Closed thrix closed 1 year ago
From the thread, for ROCm testing I would need either a vega or navi HW. Anything before that is really buggy and not great for testing. I doubt you have MI HW in AWS, but those are based on vega/navi and are designed for ROCm, so those would obviously work too.
How would one look for "a vega or navi HW"? As a GPU noob, I presume it could boil down to something like a "graphic card" name, model names, vendors, something that would be similar to the current CPU requirement specs, https://tmt.readthedocs.io/en/stable/spec/hardware.html#cpu.
00:1e.0 Display controller: Advanced Micro Devices, Inc. [AMD/ATI] Navi 12 [Radeon Pro V520/V540] (rev c3)
A bit of brainstorming: a "vendor" seems to be something to recognize, "Radeon Pro V520/V540" smells feels like cpu.model-name
, "Navi" is supposed to be a "code name", the architecture is called different, probably no point trying to fit it into the CPU's "family" or "family-name".
gpu:
# Probably not precise, might end up way too verbose
model-name: "~ Radeon Pro .+"
# Sure, this should be cheap to support, even though it's fairly useless on its own
vendor: AMD
# "I would need either a vega or navi HW"
arch: "~ vega|navi"
It needs to be configurable for instance-type-based providers like AWS or OpenStack, we already can extract CPU info from AWS EC2 describe-instance-types
and route model-name: Graviton3
to the right set of instance types, and different instance types may easily share vendor but not model-name; and Beaker needs to expose GPU info, we could then create a filter to match it, even if it would be merged from these distinct keys (I bet it does expose the info, but I don't recall the right XML element for the filter).
We could just filter for vendor AMD and model name containing Radeon, as I doubt much of the older HW is floating around these days. If it's not as easy as greping lscpi, then worse case, I could setup some complex regex for getting the model names that are applicable for the test that I'm doing. I.e. a whitelist of models that would work for the test.
Does that seem feasible?
@Mystro256 @happz so looking further to map this something that is comming out from lshw
and lspci
:
my localhost
*-display
description: VGA compatible controller
product: Alder Lake-P Integrated Graphics Controller
vendor: Intel Corporation
physical id: 2
bus info: pci@0000:00:02.0
logical name: /dev/fb0
version: 0c
width: 64 bits
clock: 33MHz
capabilities: vga_controller bus_master cap_list rom fb
configuration: depth=32 driver=i915 latency=0 resolution=1920,1200
resources: iomemory:600-5ff iomemory:400-3ff irq:165 memory:603c000000-603cffffff memory:4000000000-400fffffff ioport:2000(size=64) memory:c0000-dffff memory:4010000000-4016ffffff memory:4020000000-40ffffffff
aws nitro instance
[root@ip-172-31-28-199 ~]# lshw -C display
*-display UNCLAIMED
description: VGA compatible controller
product: Amazon.com, Inc.
vendor: Amazon.com, Inc.
physical id: 3
bus info: pci@0000:00:03.0
version: 00
width: 32 bits
clock: 33MHz
capabilities: vga_controller
configuration: latency=0
resources: memory:fe400000-fe7fffff memory:c0000-dffff
my desktop
$ lshw -C display
*-display
description: VGA compatible controller
product: G86 [Quadro NVS 290]
vendor: NVIDIA Corporation
physical id: 0
bus info: pci@0000:01:00.0
version: a1
width: 64 bits
clock: 33MHz
capabilities: pm msi pciexpress vga_controller bus_master cap_list rom
configuration: driver=nouveau latency=0
resources: irq:29 memory:f2000000-f2ffffff memory:e0000000-efffffff memory:f0000000-f1ffffff ioport:1100(size=128)
So I will go with:
gpu:
product-name: "~ Radeon Pro .+"
vendor-name: AMD
To comply with the naming advice in https://tmt.readthedocs.io/en/latest/spec/hardware.html#names-and-ids
@thrix wouldn't product-name
be the same field as device-name
from the device
specification PR, https://github.com/teemtee/tmt/pull/1759/files#diff-9dd87f09c4ab902df670b30e00fe89d0966a3499985258b1d1f731e52f9fd322R12?
@happz seems like it, well, naming :) I like that it is mapped to what lshw
reports, but I have no strong objections unify it
We have a request to be able to test against AWS instances with a GPU:
https://discussion.fedoraproject.org/t/setting-up-fedora-ci-for-rocm/84373/11
Seems to start, it would be enough to ask for a HW with a dedicated GPU and maybe to say if it is NVIDIA or Intel.
Any more ideas are welcome.