config-linux: add schemata field to IntelRdt

marquiz commented 9 months ago

Add a new "schemata" field to the Linux IntelRdt configuration. This addresses the complexity of separate schema fields and resolves the issue of supporting currently uncovered RDT features like L2 cache allocation and CDP (Code and Data Prioritization).

The new field is for specifying the complete schemata (all schemas) to be written to the schemata file in Linux resctrl fs. The aim is for simple usage and runtime implementation (by not requiring any parsing/filtering of data or otherwise re-implement parsing or validation of the Linux resctrl interface) and also to support all RDT features now and in the future (i.e. schemas like L2, L2CODE, L2DATA, L3CODE and L3DATA and who knows L4 or something else in the future).

Behavior of existing fields is not changed but it is required that the new schemata field is applied last.

marquiz commented 9 months ago

cc @kolyshkin @ipuustin @xiaochenshen @giuseppe

thaJeztah commented 9 months ago

(disclaimer: I only very briefly glanced over, so apologies for any mistake in interpreting); IIUC, the new field effectively is to replace the other field(s), but those already shipped in a release.

Should we consider deprecating the other fields, or add wording to discourage use of those going forward (only for backward compatibility)?

marquiz commented 9 months ago

Thanks @thaJeztah for the quick feedback

(disclaimer: I only very briefly glanced over, so apologies for any mistake in interpreting); IIUC, the new field effectively is to replace the other field(s), but those already shipped in a release.

Yes, I would say so. This is effectively to make those useless/unneeded, without breaking backwards compatibility. The current model of one field per "schema" is infeasible as we can already see that there are many features (L2 and CDP) not covered by the OSI spec.

Should we consider deprecating the other fields, or add wording to discourage use of those going forward (only for backward compatibility)?

I think we could consider that, at least suggest people use the new field. I dunno how deprecation is reacted to in OCI spec.

marquiz commented 9 months ago

@thaJeztah et al, one more consideration is that I don't believe any runtime does any parsing/filtering of the existing l3Schema or mbSchema fields. So in practice the existing fields work just like the new schema field I suggested. An alternative approach would be to make a backwards-incompatible change in the spec and just state the "you can put anything you like in the {l3,mb}Schema fields and the runtime will write those (sequentially, two separate writes) into resctrl schemata file". However, I didn't do that because the backwards-incompatibility and the somewhat misleading naming.

thaJeztah commented 9 months ago

An alternative approach would be to make a backwards-incompatible change in the spec and just state the "you can put anything you like in the {l3,mb}Schema fields and the runtime will write those

Yeah, not sure that's great, as that would effectively make both an "anything goes" field, but they would give the impression they're only to be used for a specific purpose.

I think the existing fields are a bit of a leaky abstraction already; they give the impression that it's an abstraction, but in the end (but I'm not deeply familiar with this matter) it looks like it's just artificially splitting

So I could see two possible directions;

either go "all the way in", and every possible RDT feature must have a counterpart in the specification
- :+1: runtimes can parse, and validate options
- :+1: runtimes (and the spec) can advertise exactly what RDT options it supports (https://github.com/opencontainers/runtime-spec/pull/1130)
- :point_up: this can improve portability
- :-1: runtimes (and the runtime spec) must be updated for changes in the RDT feature-set (new features, feature deprecation)
make it clear that there's no abstraction at all;
- the schemata field is a 1:1 mapping to the schemata file
- runtimes SHOULD NOT perform any validation of the content (perhaps some restrictions w.r.t. accepted characters and/or length)
- runtimes MUST write the content of the schemata field to the schemata file as-is

For the schemata field, and deprecation of the existing fields (I'd have to look up the correct wording for "deprecated");

mention that l3Schema and memBwSchema SHOULD NOT be used, for reasons other than backward compatibility with older runtimes.
mention that l3Schema and memBwSchema MUST NOT be added to if schemata if set (i.e., if schemata is used, it replaces any use of l3Schema and memBwSchema)
:question: TBD: what would be the best approach to provide a "hybrid" spec for backward compatibility (i.e., an OCI spec that both older (using the l3Schema and memBwSchema) and newer (through schemata) runtimes would accept?
- older runtimes (taking the existing spec into account) would only be able to use L3 (not L2)
- which means that either the spec will result in a different outcome (older runtimes only L3, newer runtimes (from schemata) potentially using both L3 and L2 features)
- so if we want a "hybrid" spec to produce the SAME result, then runtimes MUST validate that concatenating l3Schema and memBwSchema MUST be identical to the content of the schemata field
- if, on the other hand, we are ok with the result being different from older runtimes, we could either just "document" that they may differ, or document that l3Schema and memBwSchema MUST be empty if schemata is non-empty (i.e. can either use new features, or no RDT at all).

For the schemata field itself

We could refer to Intel's documentation as canonical reference for the accepted format (i.e., what's supported being outside of the OCI runtime specification).
However; if the format itself is well-documented and stable (i.e., new features may be added, but all follow the same format), we COULD document the accepted format (accepted characters, overall format).
- I don't think the runtime-spec's go-implementation generally provides utilities for this ("runtime-tools" might)
- Maybe we could provide / document a (suggested) regular expression that captures the format (which runtimes and runtime-tools can implement).
:question: does RDT provide any form of feature detection? And if so, is that something which could be included in the features.md (or would that be beyond scope?)
- This information could be used by "higher level" runtimes (runtimes that generate the OCI spec for a container)

marquiz commented 9 months ago

@thaJeztah appreciate your thoughtful review. My comments in-lined below

Yeah, not sure that's great, as that would effectively make both an "anything goes" field, but they would give the impression they're only to be used for a specific purpose.

I agree

I think the existing fields are a bit of a leaky abstraction already; they give the impression that it's an abstraction, but in the end (but I'm not deeply familiar with this matter) it looks like it's just artificially splitting

Exactly my thinking/feeling. We could possibly use something like

type LinuxIntelRdt struct {
...
    Schemata []LinuxIntelRdtSchema
...
}

type LinuxIntelRdtSchema struct {
    Type   String  // "L2", "L3", "MB" etc
    Schema String  // Depends on the Type, e.g. "0=f;1=f"
}

But I don't think that makes anyone's life any easier.

So I could see two possible directions;

either go "all the way in", and every possible RDT feature must have a counterpart in the specification

👍 runtimes can parse, and validate options

👍 runtimes (and the spec) can advertise exactly what RDT options it supports (Add features.md to formalize the runc features JSON #1130)

☝️ this can improve portability

👎 runtimes (and the runtime spec) must be updated for changes in the RDT feature-set (new features, feature deprecation)

I think this would be a maintenance nightmare. Even with the current very limited validation/filtering specified I think this hasn't been implemented. Plus for a full validation the runtime will need virtually to replicate the validation code in linux kernel. Who knows what the actual format for future "FOOBAR" technology will look like. And e.g. in the case of RDT the min/max width of the bitmask varies between HW etc.

make it clear that there's no abstraction at all;

the schemata field is a 1:1 mapping to the schemata file

runtimes SHOULD NOT perform any validation of the content (perhaps some restrictions w.r.t. accepted characters and/or length)

runtimes MUST write the content of the schemata field to the schemata file as-is

I'm a strong proponent of this approach. Let kernel do the validation because it can do it correctly 😇

For the schemata field, and deprecation of the existing fields (I'd have to look up the correct wording for "deprecated");

mention that l3Schema and memBwSchema SHOULD NOT be used, for reasons other than backward compatibility with older runtimes.

mention that l3Schema and memBwSchema MUST NOT be added to if schemata if set (i.e., if schemata is used, it replaces any use of l3Schema and memBwSchema)

❓ TBD: what would be the best approach to provide a "hybrid" spec for backward compatibility (i.e., an OCI spec that both older (using the l3Schema and memBwSchema) and newer (through schemata) runtimes would accept?

older runtimes (taking the existing spec into account) would only be able to use L3 (not L2)

which means that either the spec will result in a different outcome (older runtimes only L3, newer runtimes (from schemata) potentially using both L3 and L2 features)

so if we want a "hybrid" spec to produce the SAME result, then runtimes MUST validate that concatenating l3Schema and memBwSchema MUST be identical to the content of the schemata field

if, on the other hand, we are ok with the result being different from older runtimes, we could either just "document" that they may differ, or document that l3Schema and memBwSchema MUST be empty if schemata is non-empty (i.e. can either use new features, or no RDT at all).

I'd strongly suggest to keep it simple. Any schema validation will be a tricky business to do right with all the corner cases like L3:0=f;1=f0 and L3:1=f0;0=f will have the same end result. So either -l3Schema and memBwSchema MUST NOT be set to if schemata if set

or then just accept that the end result may be different

In any case the "portability" of the schemata is very poor even between reboots/remount of the resctrlfs as the format and accepted fields in the schemata file depends e.g. on the mount options.

For the schemata field itself

We could refer to Intel's documentation as canonical reference for the accepted format (i.e., what's supported being outside of the OCI runtime specification).

However; if the format itself is well-documented and stable (i.e., new features may be added, but all follow the same format), we COULD document the accepted format (accepted characters, overall format).

I don't think the runtime-spec's go-implementation generally provides utilities for this ("runtime-tools" might)

Maybe we could provide / document a (suggested) regular expression that captures the format (which runtimes and runtime-tools can implement).

❓ does RDT provide any form of feature detection? And if so, is that something which could be included in the features.md (or would that be beyond scope?)

This information could be used by "higher level" runtimes (runtimes that generate the OCI spec for a container)

The documentation is not Intel's but in Linux kernel (https://docs.kernel.org/arch/x86/resctrl.html). But 👍 for adding a reference. I would refer to that directly instead of describing it here. With new kernel versions there may be (backwards-compatible) extensions and new technologies enabled that the OCI spec would then be missing. I think that also applies to any regular expression we tried to add here.

RDT does provide information about the currently active setup (what features are enabled and in which configuration) under /sys/fs/resctrl/info/. That might be something useful to add into features.md but I think that would be a subject of a separate PR – to think about thoroughly.

thaJeztah commented 9 months ago

I think this would be a maintenance nightmare. Even with the current very limited validation/filtering specified I think this hasn't been implemented

:100: I fully agree. My main intent here was to mention options we considered (for future visitors, including "future self"), and rejected (for reasons mentioned). It's good to try to provide an abstraction, but in some case, such as this one, I don't think that's possible, and we just need to acknowledge that some parts of the OCI just are very platform-specific and tightly coupled to kernel/platform features.

I'd strongly suggest to keep it simple. Any schema validation will be a tricky business

Yes, that's what I somewhat expected the case to be; I guess the recommendation would be "use the new option", and only in exceptional cases consider using "both" (if you like schrödingers cat).

The documentation is not Intel's but in Linux kernel (https://docs.kernel.org/arch/x86/resctrl.html). But 👍 for adding a reference.

Heh, yes, let myself go on the "canonical reference" side; definitely "+1" on keeping docs in this spec as minimal as possible, and refer to the kernel reference as source of truth.

I think that also applies to any regular expression we tried to add here.

If it's well-defined and a regex is possible (either in docs and/or the JSON-schema), I guess that would be good to have.

RDT does provide information about the currently active setup (what features are enabled and in which configuration) under /sys/fs/resctrl/info/. That might be something useful to add into features.md but I think that would be a subject of a separate PR – to think about thoroughly.

Yes, that's for sure not a blocker (from my point of view) for this PR.

My main motivation for some of these comments is to explore what amount of validation can reasonably be performed by (higher-level) runtimes before "the spec hits the kernel".

My general rule of thumb is to keep that as minimal as possible for most cases (unless we can absolutely be sure), but a "well-formedness" check (where possible) can help. Having some amount of early error-handling can reduce burden on maintainers (as all OCI / kernel / runtime errors result in a "OCI runtime" error, and those tend to be piled on by users "I have this same issue!" (no you don't 😂))

marquiz commented 9 months ago

If it's well-defined and a regex is possible (either in docs and/or the JSON-schema), I guess that would be good to have.

My feeling is that it's not well-defined enough. The currently supported format for currently supported technologies is described (https://docs.kernel.org/arch/x86/resctrl.html#l3-schemata-file-details-code-and-data-prioritization-disabled) but I wouldn't count on that. Say next kernel version will start supporting id ranges like L2:0-4=ff;5-8=ff00 or the new FOOBAR technology supporting some new format FOOBAR:xyzy;1234 we're screwed

marquiz commented 9 months ago

@thaJeztah I think I'll sleep over this and push an update after the weekend, trying to capture all the review comments

thaJeztah commented 9 months ago

For sure! Mostly exploring here if it's possible (within reason). Having a quick look at examples on that link;

# echo "L3:0=3;1=3\nMB:0=1024;1=500" > /sys/fs/resctrl/p1/schemata
# echo -e "L3:0=f8000;1=fffff\nMB:0=20;1=100" > p0/schemata

Which would translate to;

L3:0=3;1=3
MB:0=1024;1=500

and;

L3:0=f8000;1=fffff
MB:0=20;1=100

Based on that the format "roughly" is a newline-delimited set of features;

<feature>:<[key=value;]...>
<feature>:<[key=value;]...>

first column is the name of the feature, followed by a colon (:)
followed by (one or more / zero or more?) semi-colon (;) separated key=value pairs.

So I guess the blanks to fill in is what are accepted characters for feature, key, and value

Whether <key>=<value> pairs are compulsory (i.e. is <feature>: without any options valid), might be useful and to some extent if =<value> is compulsory (or "key-only" are allowed), but I guess any of that may be going too much into details for the spec itself (we can treat [key=value;] mostly as a opaque element, other than "valid characters")

marquiz commented 9 months ago

Based on that the format "roughly" is a newline-delimited set of features;
<feature>:<[key=value;]...>

Yeah, the currently availabe technologies/features follow this format. My fear is that we could get bitten by future extensions (like L3:1,3,5-10=f) or new stuff, like you speculated <feature>:<key1>[;<key2>...].

One option would be to validate the currently known features and just not validate unknown stuff. Another is to follow the currently ubiquitous format <feature>:<[key=value;]...>. A third variant as you suggested would be to validate <feature>:<blob>[;<blob>] (where blob could be <key>=<value> or smth else)

In any case, we could make the schemata field an array of schemas

Schemata []string

marquiz commented 8 months ago

Small update: based on @giuseppe's comment I removed the requirement for comparing the new schemata field to the content of the schemata file. The behavior of old l3CacheSchema and memBwSchema fields is unchanged.

opencontainers / runtime-spec

config-linux: add schemata field to IntelRdt #1230