ldmsd and ldms-aggd need real first-class configuration files

morrone commented 4 years ago

I think that ldmsd and ldms-aggd really both need first-class configuration files.

Unless I am mistaken, there is no single unified configuration file for the ldmsd. Some things are configured through the command line options and/or environment variables. For those things, there appears to be no way to put them in a configuration file.

There is a "configuration file" for plugin configuration, but that is really more a script of ldms commands rather than a configuration file.

Here are some of the things that I would expect a first-class configuration file for ldmsd to have (ldms-aggd would be similar):

A top-level section devoted to general ldmsd daemon options, such as:
- networking options
- logging options
- thread options
- paths to relevant files and plugins
Sub sections for each plugin

Note that plugin configuration should be configuration, not a list of commands. In particular there should be no "load" or "start" commands in the configuration file. Those are implementation details. The daemon will know to load and start the plugin based on the fact that the plugin is configured in the configuration file.

morrone commented 4 years ago

Also, the daemons should look for the configuration file in a known, standard system location by default. This can be overridden by specifying a different configuration file on the command line.

baallan commented 4 years ago

There is a practical solution to this which is already available-- use the genders-based systemd ldms configuration shipped with the ldmsd rpms. It is 'load' and 'start' freed. The standard location is /etc/genders. For overrides, which are always needed in exotic environments or for finicky site policies, the /etc/sysconfig/ldms.d/ldms.%I.conf location works with ldmsd.service and ldmsd@.service.

So assuming you're already aware of it and find it unsatisfactory, what is it you really wanted?

Background on the systemd/genders tooling: features:

It’s a pile of bash and a dribble of C++ – don’t be afraid to inspect it
Startup process of the ldmsd.service unit:

** ldmsd-pre-system creates the environment file and configuration file for the ldmsd binary

*** /etc/sysconfig/ldms.d/ldmsd loads the .conf file and parses the genders file(s)

**** For aggregators, the ldmsctl_args3 binary analyzes genders file hierarchy to discover the host, transport, and port associations.

***/etc/sysconfig/ldms.d/ldms-functions calls generate the files to place in /var/run/ldmsd

** /usr/bin/ldmsd-wrapper.sh starts the daemon using the generated files

Systemd unit files /usr/lib/systemd/system/ldmsd[@].service may need site-specific tuning of resource limits.
Escape hatches are included to allow for all exotic use cases – skip genders files entirely if you want to supply your own configuration script.

Design goals:

Simplicity of deployment (a minimum of admin-written files, written quickly following an example)
Scalability (to work at computing center scale)
Do not repeat yourself (to avoid consistency errors)
Make the common uses easy without making the hard uses impossible • New v4 features that have not yet been addressed or explicitly tested in the genders systemd scripts are
Failover
Munge authentication
Storage groups

morrone commented 4 years ago

There is a practical solution to this which is already available-- use the genders-based systemd ldms configuration shipped with the ldmsd rpms.

So assuming you're already aware of it and find it unsatisfactory, what is it you really wanted?

Yes, I am completely aware of the genders solution. It is what we are currently using. But that solution is a wild abuse of the genders system. Genders is not intended to be the fine grain configuration tool for applications. And in order to shoe-horn it into genders, we need to do a bunch of translations from normal ldms configuration into an intermediate genders language in our heads, which even in the normal simple configuration case looks like line noise in genders, and still requires eight (at the minimum!) other files to be configured and tracked in our configuration management system.

It was a nice experiment. I understand what you were going for. But I think it is really time to retire that approach and go with a much simpler configuration file approach that I suggest in this ticket.

It’s a pile of bash and a dribble of C++ – don’t be afraid to inspect it

It is just not reasonable to ask our sysadmins to dive into that. The genders entries look like line noise, and the windy set of scripts is a pain to follow. I don't want to be a system administrator, so I need something that is less unconventional in configuration approach that I can reasonably hand off to the operations and system administrations teams.

Design goals:

Simplicity of deployment (a minimum of admin-written files, written quickly following an example)

I'm sorry, but this goal was entirely missed in the current approach.

Scalability (to work at computing center scale)

We scale the configuration of many other services with similar configuration complexity using much more straight-forward approaches than the one devised for ldms.

Do not repeat yourself (to avoid consistency errors)

But there are at least 4 different places to do the same exact thing! Not repeating yourself is an exercise in extreme restraint given this system rather than enforced by design.

Make the common uses easy without making the hard uses impossible

This makes the common use case difficult.

Failover

Munge authentication
Storage groups

All of that seems achievable with a more standard approach to configuration files.

I am happy to iterate on what a good configuration file approach looks like. I hope I'm not being too harsh, but the current genders approach is really not working well for us. We really need to overhaul the configuration approach to bring it more in line with more standard configuration practices.

baallan commented 4 years ago

Please suggest an actual declarative configuration syntax and semantics for the '1st class' configuration agent. The current genders implementation represents the minimum needed for ldms v2/v3. I can picture something as easy as 'ldmsd5 --system=$FILE' from the command line. (where if FILE is in the standard place, you don't even need --system).

But what is in the file? A json punctuation of the genders data, where host-expressions still figure prominently in the syntax? XML and configs are written with a LLNL-contributed gui? Are there 'includes' of some sort? How are the escape hatches handled for exotic situations and bug workarounds?

I agree the genders syntax is limiting and the bash scripting is less than entirely pleasant. That's why at a point it drops into c++ to deal with complex queries.

A nearly comprehensive answer to this question would look like attaching to this issue the 'new format' config files you propose to replace the current L0/L1/L2 genders configurations you currently use for an actual LLNL cluster. (manually translated).

morrone commented 4 years ago

I don't particularly care at this stage what the specific configuration file format looks like, but I can give some early guidance.

It needs to be something reasonably human readable and editable with vi. That pretty much rules out XML, and anything that would reasonably require a separate GUI to edit it. Something ini-style, yaml, or something along those line would be in line with the desired approach.

Here's a first approximation of what I am thinking. I readily admit that I don't have the full scope of configuration requirements in my head yet, so this will necessarily need to go through some design iteration before it is fully workable. But perhaps it will help to spur more discussion:

[main]
port=444
transport=rdma
authfile=/etc/ldms_auth.conf

[sampler dcgm]
interval=1000000
offset=0
fields=105,115,1000,1001,1005,1006
schema_name=my_favorite_gpu_fields
producer=${hostname}

That might very well be all that is needed for a sampler node, perhaps with more plugin sections. More sections needed for aggregators and storers, of course.

tom95858 commented 4 years ago

v5 configuration files already support all of this.

tom95858 commented 4 years ago

However the current plan is to change the syntax to JSon. Now is the time for anyone with strong opinions on syntax to weigh in.

oceandlr commented 4 years ago

What were the design tradeoffs that led to you preferring to go with json?

tom95858 commented 4 years ago

"Pros"

Human readable
Able to represent complex relationships
Existing parser for both C, Python, Javascript/NodeJS
Suitable as wire protocol

"Cons"

Verbose
Some people groan when JSon is mentioned

We could certainly have a tool that would generate the JSon from a syntax like what @morrone was suggesting.

Aside from the 'syntax', there is still a lot of 'design' around what the various objects are and how they are encoded.

There is also a notion of how configuration is 'activated', i.e. the configuration is defined as a JSon object, but how is configuration state change done? In today's syntax, we have 'verbs' (start, stop, ...) and 'objects' (prdcr_add, smplr_add, ...) intermingled. start, for example encodes both state change (idle-->start) and configuration (interval, offset). A goal is to split all configuration vs. state so that a complete, idle, configuration can be exchanged with a peer as part of load-balance or fail-over for example.

baallan commented 4 years ago

json is, as tom noted, siutable for use as a wire protocol representing binary structures. for ldms we need both more and less than json. i will convert a production genders file to demo.

morrone commented 4 years ago

json is a data interchange format, so yes of course it works well as wire protocol. But it does not work well for configuration files. In particular, json has no support for comments! Comments should absolutely be a requirement of the configuration file syntax. Further, json wouldn't make it terribly easy to introduce substitution patterns without ugly escaping, or other advanced configuration file capabilities like includes (which @baallan noted as a possibility). We might not need includes on day one, but it would definitely be nice to leave open the possibility.

Where can I go to learn about this v5 configuration format that you speak of, @tom95858?

oceandlr commented 4 years ago

I share Chris's desire for something that supports comments and easy editing. changing json always results in a lot of { and [ debugging for me. difficult to do diffs also.

morrone commented 4 years ago

Another requirement that just came to mind:

When a bad value is encountered (not just a syntax error) in the configuration file, the daemon needs to be able to return an error saying on what line in which file the error occurred.

baallan commented 4 years ago

For use with other tools, we should consider including a requirement that the syntax be trivially mappable to json. I already have a json parser that does line tracking and comment handling. The syntax proposition I have in progress is to accept that for ldms configuration everything is a string and so no double quoting is needed unless we want to protect whitespace or commas. Full example coming, but also somewhat in the spirit of relaxedjson:

cn[1-32] : { # compute nodes config ldmsd: { path : /foo samplers : bar,baz bar_config : "nproc=36 infile=/proc/weirdplace" } } admin1: { ldmsaggd: { clients : cn[1-32] } }

tom95858 commented 4 years ago

I agree with @morrone regarding the lack of comments, and @oceandlr regarding the { and ( and the desire for include. So let's assume that there is a new format. I have a few thoughts:

There is a 'save-running-config' capability. Does it have to save the comments it may have read when loading the configuration file? What about includes? Saving all this config-meta-data is painful. My vote is "no"
I had imagined a 'templating' capability. The user defines a template and then refers to it plus mods for each object based on the template. For example:

[all-producers]: // this is the producer template obj-type: producer reconnect: 20s connection: active port: 411

[nid0001]: use: all-producers host: nid0001

[nid0002]: use: all-producers host: nid0002

[nid[0003-1234]]: // some kind of generator syntax use: all-producers host: @myname

BTW,

we have used YAML as a config file format in the past and my recollection was that it was widely reviled.
absent a 'standard' encoding, there won't be syntax highlighting available in vi/emacs
'save-running-config' will produce the fully specified object for each one.
There needs to be syntax for moving objects through the state machine. Something like: set nid[0000-1234] state running In YAML perhaps there would be a [action] section that is handled differently

Although there are benefits to the super simple syntax, there are down-sides too: you can't define objects within other objects (nesting) and simple syntax issues like trailing ',' can give you another kind of config debug heartache.

For example:

define updater update_me: interval: auto producers: a,b,c, d, e, f, g, schema: biffle

The configuration parser will be looking for a producer named 'schema'. Your error message will be something intuitive like: line 1234, col 48: producer named 'schema' not found.

followed by: syntax error: expecting keyword, but got ':'

Note that adding comments, include, etc... to our JSon parser is trivial; however, it will then not be parse-able by the compliant parsers like Python. Javascript, however, is very forgiving.

tom95858 commented 4 years ago

... sorry wiki markdown ate my indents...

morrone commented 4 years ago

I would like to see the parts where different nodes are configured in the same file removed. I completely see what you are going for, @tom95858. I think in broad terms the problem is that you are trying to reinvent cluster configuration management in ldms, only for ldms. That does not play terribly well with a site that employs a configuration management approach for all services that they are managing on a cluster.

Instead, what we want is a file that tells one service what to do, with templating/substitution supported.

I'll give the LLNL example, but other sites use other configuration management approaches that would benefit from this design as well.

We store configuration files in cfengine, and then use genders to describe which roles and services each node provides. On most of our clusters, we would have two classes of ldms-related services: samplers and aggregators. In cfengine, we would want to have two files, perhaps named: ldmsd_sampler.conf and ldmsd_aggregator.conf.

In cfengine we would then have:

alpha[1-1000] ldmssampler
alpha[1001-1004] ldmsaggregator
alpha1001 ldms_aggregate_from=alpha[1-250]
alpha1002 ldms_aggregate_from=alpha[251-500]
alpha1003 ldms_aggregate_from=alpha[501-750]
alpha1004 ldms_aggregate_from=alpha[751-1000]

Or something to that effect. The great thing here is that now the configuration is replicable across many clusters, not just cluster "alpha". A sysadmin can bring up a new cluster just with a few simple rules in genders, while keeping all of the detailed configuration in the configuration files. Most admins won't need to know much about ldms to get it properly running on a new cluster.

But the admins that do know about ldms and want to change it can edit the appropriate ldms configuration file and have it take effect on next restart everywhere that that class of ldms service runs.

Again, this is just LLNL's cfengine+genders approach to cluster configuration management. But there are a number of other configuration management systems out there. The main point I want to get across is that configuration management should be left to the configuration management system, and the ldms configuration file should just configure one ldms daemon clearly and concisely.

morrone commented 4 years ago

I am going to suggest using TOML as the base configuration language:

https://github.com/toml-lang/toml

baallan commented 4 years ago

@morrone I like the idea of simplifying genders (if needed at all) down to something indicating which ldmsd instance one wants launched (ldmsd collector, ldmsd agg, etc), but there is a devil in the details. The aggregators (and the bits that monitor them) need to be able to discover (without relying on samplers to connect to them for discovery) what they are supposed to be aggregating in terms of (host addr/port/schedule) and what schemas they ought to expect.

A unified file allows for this easily; it's not so obvious (maybe I'm slow) how this is accomplished with one of 'conventional' image management engines.

Also, as a side note, we don't use cfengine and the like on snl production clusters presently. That might be changing so I'm going to see if we can get one of them to weigh in on this thread.

baallan commented 4 years ago

by-the-by I see no reason a change like this couldn't be backported to work with v4/systemd. I doubt it has any implications on the existing command-line language or wire protocols that couldn't be handled with an appropriate preprocessor (much as genders is handled now, or maybe rather more simply).

morrone commented 4 years ago

The aggregators (and the bits that monitor them) need to be able to discover (without relying on samplers to connect to them for discovery) what they are supposed to be aggregating in terms of (host addr/port/schedule) and what schemas they ought to expect.

We would just have a configuration file for the aggregators. It lists what schemas to collect from which nodes on which ports, where to store the data, etc. There is no need for that configuration file to be shared with the sampler nodes' configuration file.

Configuration files will only be shared among the nodes where simple pattern substitution will account for the differences. For instance, aggregator A will monitor nodes foo[1-100], aggregator B will monitor nodes foo[101-200]. That node list can be substituted at launch time from genders or another configuration management approach. I will work on mocking up an example aggregator later today.

baallan commented 4 years ago

The aggregators and samplers need to agree on port assignments, transports, and schemas, which is complicated enough to be easy to mismatch if they are defined separately. But perhaps if we get something sensible together for each node class, a ldms-config-lint could be applied to generate warnings about unmatched ports/transports and unaggregated schema when given a set of node class files.

morrone commented 4 years ago

Having to coordinate things like ports, transports, etc. is not complicated. It is the sort of thing that system administrators do all the time for any number of services running on a cluster.

Putting all of the configuration into a single file with sections for specific hosts would reinvent the wheel just for ldms. It would also integrate poorly with existing cluster configuration management practices. That will really just not work for us.

baallan commented 4 years ago

@morrone @tom95858 @gentile I added a proposal page related to this issue of the sort discussed at LDMSCON, where most agreed that something like PEP (python) was perhaps too much process but we need some formalism to help others digest and form opinions. https://github.com/ovis-hpc/ovis/wiki/Proposal-2

@gentile @brandt I added a summary of our desires for a modicum of process at https://github.com/ovis-hpc/ovis/wiki/Proposal-1. Please revise and extend anything I may have missed.

The far bottom right of the index bar on our main page links to the proposal list https://github.com/ovis-hpc/ovis/wiki/OVIS-Change-Proposals

morrone commented 4 years ago

I created TOML versions of two of @baallan's "relaxed json" examples. It think I would suggest more changes in the end, but at least they server as a general example of what the files could look like in TOML.

I don't have access to add to wiki pages, so here they are:

p2-local.admin.toml-v0.txt p2-agg.admin.toml-v0.txt

Also, I would suggest another requirement for the new config format:

All options should employ full english words where reasonable. For example: Instead of "dbg" use "debug", and instead of "xprt" use "transport".

baallan commented 4 years ago

I have added @morrone's examples to the wiki page. I find the double square bracket notation equivalent of json "x : [ {a},{b} ]" [[x]] a [[x]] b a bit disconcerting, but it should be easy for a parser to issue useful warnings/errors of extra/missing brackets.

morrone commented 4 years ago

There are already TOML parser libraries available, so basic syntax like that would not be something that we would need to worry about.

morrone commented 4 years ago

Another thing that I think we should embrace is the ".d" configuration directory. Perhaps the default location would be "/etc/ldmsd.d". Red Hat (among others) seems to encourage that approach, and it plays well with configuration management systems.

If we do that, it probably avoids the need for an "include" in the config file language.

It does have some implications for how we would construct the configuration file. For instance, we would probably want to get rid of the list of plugins names, and just rely on the fact that a plugin configuration section exists meaning that we want to use that plugin. In other words, we could drop "plugins" from this:

[samplers]
plugins = [
        "jobid",
        "meminfo",
        "vmstat",
        "procnfs",
        "procstat",
        "procnetdev",
        "sysclassib",
]

An basic use case would be when we have a general set of sampler plugins that we want to run on all nodes, for instance meminfo and vmstat. That could go into one file:

# /etc/ldmsd.d/01-global-samplers.conf
[samplers.meminfo]
with_jobid = 1

[samplers.vmstat]
with_jobid = 1

And then some nodes additionally need the gpu sampler:

# /etc/ldmsd.d/20-gpu-sampler.conf
[samplers.gpu]
with_jobid = 1

And then configuration management decides which nodes see just 01-global-samplers.conf, and which nodes get both 01-global-samplers.conf and 20-gpu-sampler.conf.

Higher level options might either go in the 01-global-samplers.conf, or in a separate file like so:

# /etc/ldmsd.d/00-global.conf
port = 3992

[samplers]
default_interval=1000000
default_offset = 0

It would be up to the administrator's personal preference. People can go as far as they want into splitting up their configurations into different files.

This approach seems to play well with existing standard practices.

The only issue I see so far is the potential loss of the "plugins" list (which wouldn't be as easy to maintain with this approach). But the only reason I have heard so far for needing that list is ordering of some kind. I'm not sure what ordering could be significant there, so maybe that really isn't an issue for 99% of users? And perhaps if there is a real need for the remaining 1%, then we can provide an optional "order = N" parameter under each sampler plugin (other plugin types too) to allow explicit ordering that way.

tom95858 commented 4 years ago

On Wed, Oct 30, 2019 at 2:57 PM Christopher J. Morrone < notifications@github.com> wrote:

Another thing that I think we should embrace is the ".d" configuration directory. Perhaps the default location would be "/etc/ldmsd.d". Red Hat (among others) seems to encourage that approach, and it plays well with configuration management systems.

If we do that, it probably avoids the need for an "include" in the config file language.

It does have some implications for how we would construct the configuration file. For instance, we would probably want to get rid of the list of plugins names, and just rely on the fact that a plugin configuration section exists meaning that we want to use that plugin. In other words, we could drop "plugins" from this:

Top level options such as:

enabled=false

would be nice. If present, this configuration file is "skipped". This seems needed if we're going with the *.d approach.

[samplers] plugins = [ "jobid", "meminfo", "vmstat", "procnfs", "procstat", "procnetdev", "sysclassib", ]

An basic use case would be when we have a general set of sampler plugins that we want to run on all nodes, for instance meminfo and vmstat. That could go into one file:

/etc/ldmsd.d/01-global-samplers.conf

[samplers.meminfo] with_jobid = 1

[samplers.vmstat] with_jobid = 1

And then some nodes additionally need the gpu sampler:

/etc/ldmsd.d/20-gpu-sampler.conf

[samplers.gpu] with_jobid = 1

And then configuration management decides which nodes see just 01-global-samplers.conf, and which nodes get both 01-global-samplers.conf and 20-gpu-sampler.conf.

Higher level options might either go in the 01-global-samplers.conf, or in a separate file like so:

/etc/ldmsd.d/00-global.conf

port = 3992

Not a huge fan of the separate file. "Groups" of files for a particular configuration may have different defaults.

[samplers] default_interval=1000000

I don't think the default_ prefix is necessary. It is obvious from the section in which it is found. It would be nice to support a suffix like 's', 'ms', etc... so that the above becomes:

interval=1s

default_offset = 0

It would be up to the administrator's personal preference. People can go as far as they want into splitting up their configurations into different files.

This approach seems to play well with existing standard practices.

The only issue I see so far is the potential loss of the "plugins" list (which wouldn't be as easy to maintain with this approach). But the only reason I have heard so far for needing that list is ordering of some kind. I'm not sure what ordering could be significant there, so maybe that really isn't an issue for 99% of users? And perhaps if there is a real need for the remaining 1%, then we can provide an optional "order = N" parameter under each sampler plugin (other plugin types too) to allow explicit ordering that way.

If the ordering matters, then the configuration and/or our design is broken.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/ovis-hpc/ovis/issues/67?email_source=notifications&email_token=ABVTPXHMTALG55JPMWUSUULQRHYMHA5CNFSM4JDS2AWKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOECVXXPY#issuecomment-548109247, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABVTPXAYIX33PX7JZ6WHCRTQRHYMHANCNFSM4JDS2AWA .

-- Thomas Tucker, President, Open Grid Computing, Inc.

baallan commented 4 years ago

While I like the idea that plug-in order does not matter in principle, it's not impossible for plugin libraries to expect other plugin libraries to be loaded (and maybe even configured) first in order to function correctly. We have fishiness in that regard around job samplers.

We could document a requirement for all plugins to load and act independently, with log errors about 'waiting for this other data source to appear' and eventual correct behavior when everything needed has been loaded. At least in the past when you loaded job sampler last and started other samplers before it, they would get misconfigured as having no job data.

baallan commented 4 years ago

Our primary target languages would be C and python.

A review of the currently maintained implementations for C and python shows that neither apparently supports two of our requirements: comment preservation and origin info producing error messages.

So we will need to create and maintain our own TOML implementations.

If we're doing that, we may as well simplify the problem and create a typeless toml extension or alternatively include a 'toml validation' functionality so that the parse can type check the expected values against what the plugin wants for specific keys and issue errors. This would take a lot of the current avl checking burden away from the plugin writers.

baallan commented 4 years ago

@morrone an ldms.conf.d scheme is fine when there is exactly one instance of a daemon ever. How would you allow independent definition of multiple system daemon instances on the same node (which is practically but not formally a requirement for aggregator nodes in ldmsd). Do I have to write:

/etc/ldms.conf:
[[instance]]
name=local
confdir=/etc/ldms.d
[[instance]]
name=agg
confdir=/etc/ldms.d/agg
[[instance]]
name=v5beta
confdir=/opt/ovis/5.0/etc/ldms.d

Or something else? The above would make it messy to install and remove alternate deployments.

tom95858 commented 4 years ago

I'm sorry but if order matters were broken. That's the design requirement.

On Thu, Oct 31, 2019, 10:41 AM Benjamin Allan notifications@github.com wrote:

While I like the idea that plug-in order does not matter in principle, it's not impossible for plugin libraries to expect other plugin libraries to be loaded (and maybe even configured) first in order to function correctly. We have fishiness in that regard around job samplers.

We could document a requirement for all plugins to load and act independently, with log errors about 'waiting for this other data source to appear' and eventual correct behavior when everything needed has been loaded. At least in the past when you loaded job sampler last and started other samplers before it, they would get misconfigured as having no job data.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/ovis-hpc/ovis/issues/67?email_source=notifications&email_token=ABVTPXELLU6UBLYNAKYHWATQRLVCXA5CNFSM4JDS2AWKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOECYAYNY#issuecomment-548408375, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABVTPXALIW5SEYORDRTQYY3QRLVCXANCNFSM4JDS2AWA .

morrone commented 4 years ago

I think if you need multiple of the same daemon on the same node, that is where you start employing the systemd @ syntax. So the daemons might be started with "systemctl start ldmsd@1.service", "systemctl start ldmsd@2.service", etc. The unit file would be written to add something to the ldmsd command like "ldmsd --config-dir=/etc/ldmsd@$i.d". The admins then could configure the daemons separately in /etc/ldmsd@1.d and /etc/lmdsd@2.d.

Words are supported in the @ syntax too, not just numbers.

morrone commented 4 years ago

A review of the currently maintained implementations for C and python shows that neither apparently supports two of our requirements: comment preservation and origin info producing error messages.

So we will need to create and maintain our own TOML implementations.

I don't think anyone anywhere preserves comments though to the main program, so I think that requirement is obvious to drop. I introduced the origin info one, and I'm totally willing to drop that in favor of using off the shelf code for the parser. So I think that probably puts using the existing parsers back in play.

baallan commented 4 years ago

@morrone so when starting ldmsd@foo.service is ldmsd --confdir=/etc/ldms.d/foo an acceptable per instance conf directory layout assumption?

baallan commented 4 years ago

@tom95858 we seem to have a lot of undocumented design requirements for plug-in contributions.

I reviewed the current v4.3 code for job data and it appears the source for sampler_base (without making an actual test) now correctly handles the job sampler being loaded after another sampler starts.

But this is an artifact of the defensive sampler_base coding, not something that is guaranteed because first we do all the loads, then we do an ordered list of starts. In general we can't guarantee any dependency order because the daemon is in design interactively configured.

baallan commented 4 years ago

@morrone I agree we don't need to preserve comments if we are only reading the file. somehow, however it seems someone always comes along later wanting a scriptable command line utility for reconfiguring the daemon permanently by inserting/removing stuff. The comment preservation is key in that scenario.

We do need tracking of origin of values, especially if we're going to distribute configuration over 8+ files as you disliked in your review of the genders/bash implementation.

morrone commented 4 years ago

@morrone so when starting ldmsd@foo.service is ldmsd --confdir=/etc/ldms.d/foo an acceptable per instance conf directory layout assumption?

Sure, I don't see why not. Ultimately that is up to the user/adminstrator to decide.

however it seems someone always comes along later wanting a scriptable command line utility for reconfiguring the daemon permanently by inserting/removing stuff

I think it is entirely reasonable to just tell that person no. If one wants to configure the daemon using a configuration file and comments, then just use the configuration file to make changes. If a scripting method to change configuration is available at all (I don't see why it would be necessary for something like this), then embrace script-based configuration and don't expect the configuration file to change. It is just too much to ask to expect that scripting config changes will somehow magically merge back into a configuration file with comments and indent style, and ordering all aligned.

I would probably argue that the scripting configuration shouldn't even exist. But assuming that there is a good use case where the configuration can't be done reasonable through a config file and I just haven't thought of the case, then I don't think we need that use case to dictate major file format decisions. It is a niche use case; the tail wagging the dog.

tom95858 commented 4 years ago

Please keep in mind that v5 has a load balance group/failover capability. All nodes share the same configuration and partition the work based on the number of nodes in the load balance group.I'm pointing this out again so that we keep in mind that configuration objects, and state change are independent. When creating the configuration objects, import order does not matter. When moving the objects through various states, obviously the order of state changes does matter.

Please parse what follows loosely wrt syntax. I'm not particularly familiar with TOML syntax.

[producer-01]: type = producer this-and-that = the other

[action.start]: target = prdcr[00-17]

Configuration and state changes cannot intermingle, for example, specifying the sample interval in the start section is not a valid design choice.

If multiple action sections appear in different files, the order needs to be explicit.

[action.start.01]: target = prdcr[18-23]

WIll happen after action.start. All actions match to the same order tag (the last bit) are run random order, in other words, the target spec is not ordered.

With respect to plugins, I would prefer to get rid of the plugins section altogether and just load them when the are referenced. For example:

[store-01]: provider = csv

Configuring the default options for plugins could be done like this:

[provider.config]: provider = csv options = path:/this/and/that biffle ...

These would be the default options. They could also be overridden in the section dependent config object section, ,e.g.

[store-01]: provider = csv provider.options =

Thanks, Tom

On Thu, Oct 31, 2019 at 12:09 PM Christopher J. Morrone < notifications@github.com> wrote:

@morrone https://github.com/morrone so when starting ldmsd@foo.service is ldmsd --confdir=/etc/ldms.d/foo an acceptable per instance conf directory layout assumption?

Sure, I don't see why not. Ultimately that is up to the user/adminstrator to decide.

however it seems someone always comes along later wanting a scriptable command line utility for reconfiguring the daemon permanently by inserting/removing stuff

I think it is entirely reasonable to just tell that person no. If one wants to configure the daemon using a configuration file and comments, then just use the configuration file to make changes. If a scripting method to change configuration is available at all (I don't see why it would be necessary for something like this), then embrace script-based configuration and don't expect the configuration file to change. It is just too much to ask to expect that scripting config changes will somehow magically merge back into a configuration file with comments and indent style, and ordering all aligned.

I would probably argue that the scripting configuration shouldn't even exist. But assuming that there is a good use case where the configuration can't be done reasonable through a config file and I just haven't thought of the case, then I don't think we need that use case to dictate major file format decisions. It is a niche use case; the tail wagging the dog.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/ovis-hpc/ovis/issues/67?email_source=notifications&email_token=ABVTPXDGE7TR4GLXHS7FMSDQRMNN7A5CNFSM4JDS2AWKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOECYXQIA#issuecomment-548501536, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABVTPXAJCS3WGSCBUFXZU2TQRMNN7ANCNFSM4JDS2AWA .

-- Thomas Tucker, President, Open Grid Computing, Inc.

morrone commented 4 years ago

Please keep in mind that v5 has a load balance group/failover capability.

Is there any public information on the design approach for that? It is hard to keep in mind if I don't know what it is.

I wasn't really following most of what you were saying in that last comment. Generally speaking, configuration and state information would be stored separately, but it almost sounds like you are trying to mix state information into the configuration files? But again, I might just not be understanding.

morrone commented 4 years ago

Top level options such as: enabled=false would be nice. If present, this configuration file is "skipped". This seems needed if we're going with the *.d approach.

Generally people rename the file to something that won't be parsed. Often files ending in "~" are skipped, for example. We might make it so that file name starting with "#" are also skipped, or something to that effect.

baallan commented 4 years ago

Do we need to do something less traditional and more complicated than: :::ld.so.conf::: include ld.so.conf.d/*.conf ? We can pick a different suffix than .conf (.toml?) if need be

morrone commented 4 years ago

Reading only files matching *.conf or similar would certainly work.

ovis-hpc / ovis

ldmsd and ldms-aggd need real first-class configuration files #67

/etc/ldmsd.d/01-global-samplers.conf

/etc/ldmsd.d/20-gpu-sampler.conf

/etc/ldmsd.d/00-global.conf