intel / idxd-config

Accel-config / libaccel-config
Other
59 stars 35 forks source link

Should provide an --auto option #31

Closed fenrus75 closed 1 year ago

fenrus75 commented 1 year ago

accel-config should provide an --auto option so that no manual configuration by the sysadmin is needed (of course manual configurations should be possible, but without it, a reasonable set of defaults should be used)

aegl commented 1 year ago

What are the “reasonable defaults”? There are a gazillion possibilities. The right thing may depend a lot on what the user plans to do with DSA.

-Tony

From: Arjan van de Ven @.> Sent: Monday, April 3, 2023 2:35 PM To: intel/idxd-config @.> Cc: Subscribed @.***> Subject: [intel/idxd-config] Should provide an --auto option (Issue #31)

accel-config should provide an --auto option so that no manual configuration by the sysadmin is needed (of course manual configurations should be possible, but without it, a reasonable set of defaults should be used)

— Reply to this email directly, view it on GitHubhttps://github.com/intel/idxd-config/issues/31, or unsubscribehttps://github.com/notifications/unsubscribe-auth/ABJRTBMOGWOM3OAWIUSI3CDW7M67PANCNFSM6AAAAAAWR3UG7A. You are receiving this because you are subscribed to this thread.Message ID: @.**@.>>

davejiang commented 1 year ago

accel-config load-config -e should do that currently if there's a default /etc/accel-config.conf file.

ramesh-thomas commented 1 year ago

accel-config load-config -e should do that currently if there's a default /etc/accel-config.conf file

An empty load-config tries to load /etc/accel-config/accel-config.conf if present. I actually did not know this was a feature! I will update the load-config help page with this. Note that the "-e" is only necessary if the configured devices need to be also enabled.

So the question remains, which config file to put in /etc/accel-config and should we do it at install or should the sysadmin put it there?

fenrus75 commented 1 year ago

What are the “reasonable defaults”

I will argue anything is better than not getting any configuration at all.... require the sysadmin to pick "the best' is asking the person to know far too much about this.. the average sysadmin will not do this. A reasonable default will be much better than "nothing". "nothing by default" means "niche-only by default"

ramesh-thomas commented 1 year ago

I have sent out this request to internal users. Maybe we can come up with a superset default configuration file. accel-config can try to load them and silently fail invalid configurations.

Maybe it would be more descriptive to add an option "--default / -d" to the load-config command to do this instead of an empty load-config implying that.

Note that if a valid platform specific "optimal configuration" is possible, then driver should set it up at load and provide a sysfs interface for accel-config to request reloading that.

fenrus75 commented 1 year ago

thanks!

on the spectrum of "worst" to "best"...

worst is "nothing works at all" .... best is "optimal performance for a given workload"

I'm sure we can do better than the (current) worst case. Even if it's not perfect initially

davejiang commented 1 year ago

Isn't the default unix tools behavior is that if you don't provide a parameter, it loads the default config? I feel like -d is overkill.

ramesh-thomas commented 1 year ago

Isn't the default unix tools behavior is that if you don't provide a parameter, it loads the default config? I feel like -d is overkill.

Loading the default is intrusive and should be protected against user accidentally not passing a parameter, unless there is a way to detect a configuration has already been loaded. I don't see any reliable way to detect that. Besides, it is also consistent with other accel-config commands that make changes to the configuration or device state.

mythi commented 1 year ago

What are the “reasonable defaults”

I will argue anything is better than not getting any configuration at all.... require the sysadmin to pick "the best' is asking the person to know far too much about this.. the average sysadmin will not do this. A reasonable default will be much better than "nothing". "nothing by default" means "niche-only by default"

FWIW, we provide a "default" which essentially gives equally fair dedicated workqueues: 1queue/1engine/1group set for each device:

https://github.com/intel/intel-device-plugins-for-kubernetes/blob/0366f5bca58e5f4bc077e5df317235a4f7126b2e/demo/idxd-init.sh#L23-L35

jgasparakis commented 1 year ago

It seems like there are a few "forces" on this topic but not all of them are aligned and not all of them are not in the same place (the accel-config tool). Can we please get together on a quick call to brainstorm and decide the behavior of --auto as @fenrus75 suggested initially? I am sending a calendar event to everyone on this page.

jgasparakis commented 1 year ago

I initially I talked to @ramesh-thomas and it seemed like creating another config (called generic) in contribs folder would be a good idea but then I talked to @aegl and we came up with a slightly better approach. It seems like it's best approach to add a --user-default argument where the tool will create one group with all the available engines and work queues in one group to the user. All such work queues will be shared and accessible to the userspace (hence the --user-default name). If there are no engines or no work queues available the tool should exit without taking any action (maybe just log/message the fact?) Maybe also add the option to the user to name that group the way they want so it is crystal clear that this is their default group?

This approach was taken taking in mind that:

Please let us know if there are any further comments. I will leave the scheduled meeting on our calendars for a couple more days just in case.

ramesh-thomas commented 1 year ago

In a meeting today with @jgasparakis and @aegl, we decided to do it a bit differently than simply loading a default config file. Accel-config will scan all devices in the sysfs for engines and WQs that are not enabled yet. They will be configured into a new group in each device and enabled. The WQs would be configured as user type, shared and the name set to indicate it is a default configuration. I plan to use a template config file with default values for the configured attributes.

I am currently working on an inter-domain POC and would take this up after that is completed. I will be opening a Jira ticket to track.

fyu1 commented 1 year ago

We are working on pre-allocating WQs/engines for kernel usage.

I can extend this kernel WQs allocation to cover both user and kernel WQs.

fenrus75 commented 1 year ago

fyu1: it's not a great idea to put intel internal links into a public github

ramesh-thomas commented 1 year ago

Following patches from @fyu1 addresses this issue. It is in 4.1 stable release.

accel-config: Add doumentation for new command "config-user-default" accel-config: Add user_default_profile.conf accel-config: Add "-n " to specify WQ name for disabling WQs accel-config: Disable default configured WQs and devices accel-config: Add option "-c " to load default configura… accel-config: Add config-user-default command accel-config: Skip configuring ats_disable if the attribute is not pr…