Open ajcraig opened 1 month ago
@margo/technical-wg Please review the proposal above.
Thanks!
Here's a proposed set of modifications
kind
should be Pascal case and I don't think we need "specification" in the kind.device
CPU
and capacity
role
should be plural since a device can fill multiple roles. I don't think we need "margo" in the name either because the whole thing is Margo.periferal1
, periferal2
, interface1
, interface2
, etc. we can just key it off the nameproperties
instead of description
unless you meant for description
to be like a short sentence about the item maybe? If so, this is fine but I think we need to allow for a properties dictionary to allow for indicating the unique properties for that device. This could potentially be free form or we may need to define specific properties for certain types of hardware.apiVersion: device.margo/v1
kind: DeviceCapability
properties:
id:
vendor:
modelNumber:
serialNumber:
roles:
resources:
cpu:
architecture:
model:
cores:
frequency:
capacity:
memory:
storage:
peripherals:
- name:
type:
modelNumber:
properties:
interfaces:
- name:
type:
modelNumber:
properties:
Simple Example
apiVersion: device.margo/v1
kind: DeviceCapability
properties:
id: northstarida.xtapro.edge
vendor: Northstar Industrial Applications
modelNumber: 332ANZE1-N1
serialNumber: PF45343-AA
roles:
- standalone cluster
- cluster lead
resources:
cpu:
architecture: Intel x64
model: i9-14900KS
cores: 24
frequency: 6.2 GHz
capacity:
memory: 64.0 GB
storage: 2 TB
peripherals:
- name: NVIDIA GeForce RTX 4070 Ti SUPER OC Edition Graphics Card
type: GPU
modelNumber: TUF-RTX4070TIS-O16G
properties:
manufacturer: NVIDIA
series: NVIDIA GeForce RTX 40 Series
gpu: GeForce RTX 4070 Ti SUPER
ram: 16 GB
clockSpeed: 2640 MHz
interfaces:
- name: RTL8125 NIC 2.5G Gigabit LAN Network Card
type: Ethernet
modelNumber: RTL8125
properties:
maxSpeed: 2.5 Gbps
- name: WiFi 6E Intel AX411NGW M.2 Cnvio2
type: Wi-Fi
modelNumber: AX411NGW
properties:
bands: ["2.4 GHz", "5 GHz", "6GHz"]
maxSpeed: 2.4 Gbps
For peripherals.type
and interfaces.type
I think we'll need to make a list of types that should be used. We may not be able to come up with a complete list but we should try to come up with as many as we can think of so people are using the type consistently. If we don't, trying to match up the requirements is going to be too difficult if it's intended to be automatable.
for anything with a unit (GB, TB, GHz, Mhz, Gbps, etc.) we'll probably need to define what the unit should be (or at least a consistent naming convention) if the intention is to try to pair these resources requirements with the application's requirements. If we don't, trying to match up the requirements is going to be too difficult if it's intended to be automatable.
Do we need to include any information about the container platform it is running (e.g., Docker, Podman, Kubernetes distribution, version, etc.)?
In the resource section, "cpu" and "capacity" are not at the same semantic level ("cpu" being a device and "capacity" a characteristic). I think they should be separated. Some platforms can provide multiple CPU, with different characteristics, or different types of core in the same SOC. For example, some Arm SOC have Core A and Core R and some Intel CPU have core P and core E. Also the frequency can be changed programmatically (and sometimes by core), so I would use maxFrequency.
I would propose to:
make "cpu" its own section, with a list to allow multiple cpu, but also a list to allow different cores replace"frequency" with "maxFrequency"
apiVersion: device.margo/v1 kind: DeviceCapability properties: id: vendor: modelNumber: serialNumber: roles: resources: capacity: memory: storage: cpu:
General
From: Philip Presson @.> Sent: Thursday, June 6, 2024 09:43 To: margo/specification @.> Cc: Julien Duquesnay @.>; Team mention @.> Subject: Re: [margo/specification] Proposal/Discussion: Describe the device capabilities specification (Issue #9)
[External email: Use caution with links and attachments]
Here's a proposed set of modifications
apiVersion: device.margo/v1 kind: DeviceCapability properties: id: vendor: modelNumber: serialNumber: roles: resources: cpu: architecture: model: cores: frequency: capacity: memory: storage: peripherals:
Simple Example
apiVersion: device.margo/v1 kind: DeviceCapability properties: id: northstarida.xtapro.edge vendor: Northstar Industrial Applications modelNumber: 332ANZE1-N1 serialNumber: PF45343-AA roles:
— Reply to this email directly, view it on GitHubhttps://github.com/margo/specification/issues/9#issuecomment-2152556002, or unsubscribehttps://github.com/notifications/unsubscribe-auth/BFHFKSL5LWXKWZZZSQYPPNTZGBRPRAVCNFSM6AAAAABI3PHNT2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCNJSGU2TMMBQGI. You are receiving this because you are on a team that was mentioned.
In the resource section, "cpu" and "capacity" are not at the same semantic level ("cpu" being a device and "capacity" a characteristic).
fair point. Maybe for memory and storage, we don't need to group them under capacity
resources:
memory:
storage:
cpu:
- model:
architecture:
cores:
- type:
count:
maxFrequency:
Here is a real use case we have for an application requiring a specific piece of hardware
An application vendor has an application requiring an NVIDIA GPU on the device. The application is designed to work only with NVIDIA GPUs and recommends the NVIDIA GPU has at least 16 GB of RAM to run efficiently. For the application to run on the device the application vendor expects the device to have the NVIDIA device drivers installed as well as the Nvidia Operator on Kubernetes or NVIDIA Container toolkit for docker-compose based deployment. The drivers, operator, and toolkit require elevated permissions to install so the expectation is these are installed and configured by the device owner.
@margo/technical-wg I think it would be good if we could provide actual use cases we have where applications require a specific piece of hardware so we can see what the requirements are instead of trying to guess what they might be. I think this will help us figure out what needs to be in this file. Does anyone else have any real use cases?
Here is a real use case we have for an application requiring a specific piece of hardware
An application vendor has an application requiring an NVIDIA GPU on the device. The application is designed to work only with NVIDIA GPUs and recommends the NVIDIA GPU has at least 16 GB of RAM to run efficiently. For the application to run on the device the application vendor expects the device to have the NVIDIA device drivers installed as well as the Nvidia Operator on Kubernetes or NVIDIA Container toolkit for docker-compose based deployment. The drivers, operator, and toolkit require elevated permissions to install so the expectation is these are installed and configured by the device owner.
@margo/technical-wg I think it would be good if we could provide actual use cases we have where applications require a specific piece of hardware so we can see what the requirements are instead of trying to guess what they might be. I think this will help us figure out what needs to be in this file. Does anyone else have any real use cases?
Few observations.
The following notes were captured during the Device Requirements call. These will be incorporated into a Pull Request to finalize the Device Capabilities file.
Below I have outlined a proposal for the Device Capability Specification that is utilized by the Workload Orchestration Software. This file informs the WOS of the properties, resources, and components of the Margo compliant devices.
The associated workflow / use case for this is detailed below:
Proposed Margo Device Capability Specification
Device Capability Attributes
Top-level Attributes
Properties Atrributes
Margo Device Role Attributes
Device Resources
Device Peripherals
Peripheral Attributes
Device Interfaces
Interface Attributes