Open morrone opened 5 years ago
The sampler config line does not take interval or offset, only the start line. If you have a sampler that wants (I'm curious to know why) interval and offset, you could specify in your implementation of config that they must be supplied or your sampler is misconfigured. There would be nothing in place to enforce that the user applies the same arguments to both config and start.
I don't see any reason in v5 we couldn't expand the api to allow int/off in config and require int/off set either in config or start. Or alternatively, I don't see any reason a plugin or plugin instance in v5 can't look up its properties in the scheduler(s) -- that would be a nice general feature.
The sampler config line does not take interval or offset, only the start line. If you have a sampler that wants (I'm curious to know why) interval and offset, you could specify in your implementation of config that they must be supplied or your sampler is misconfigured. There would be nothing in place to enforce that the user applies the same arguments to both config and start.
Yes, that is a correct restatement of the issue that I am raising.
As an example, if one were writing a sampler that used DCGM, one would need to enable a watcher on GPU fields and give that watcher a sampling interval. One would desire the DCGM interval and ldms interval to be the same.
I don't see any reason in v5 we couldn't expand the api to allow int/off in config and require int/off set either in config or start. Or alternatively, I don't see any reason a plugin or plugin instance in v5 can't look up its properties in the scheduler(s) -- that would be a nice general feature.
Great.
does dcgm's watcher really have the same semantics as ldms wrt interval/offset? otherwise one is drifting independently of the other and you may over/under sample due to the inconsistency, I would think.
You work with what you've got.
Does ldms's polling interval drift freely? In other words, does it just do "sleep(interval)" rather than waking at fixed wall clock intervals?
The sample wake-up time is normally computed from interval, offset, and the current wallclock in https://github.com/ovis-hpc/ovis/blob/master/ldms/src/ldmsd/ldmsd.c :601 The result goes into a thread sleep request, as I understand it, and the result is a sample collected at the scheduled time + a small variance for kernel wakeup precision. So our interval doesn't drift unless the real time clock itself does {e.g. you get what you paid for on systems like certain crays which might be 'managing' the RTC in failure scenarios.}
?The interval is timed (i.e., there is a target wakeup time for the next sample based on either when the sampler was started ("asynchronous" case, offset not specified) or an offset from a target second ("synchronous" case, offset = xxx). Thus if you do not specify an offset, while they will not drift individually, over a whole system the sampling intervals will be distributed over time in an unspecified way.
Thanks
From: Christopher J. Morrone notifications@github.com Sent: Thursday, October 3, 2019 12:11 PM To: ovis-hpc/ovis Cc: Subscribed Subject: [EXTERNAL] Re: [ovis-hpc/ovis] Tell samplers the sampling interval (#64)
You work with what you've got.
Does ldms's polling interval drift freely? In other words, does it just do "sleep(interval)" rather than waking at fixed wall clock intervals?
- You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHubhttps://github.com/ovis-hpc/ovis/issues/64?email_source=notifications&email_token=AAOKRWOQPHUEHSIALJCYEQDQMYYW3A5CNFSM4I3LKRV2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEAJCYBQ#issuecomment-538061830, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AAOKRWIKWMWLKIUU63PRRI3QMYYW3ANCNFSM4I3LKRVQ.
OK, so if the other guy also avoids drift, then drift really isn't an issue. There is certainly jitter that could result in issues that might be handle sanely by a sampler author if enough information is available. But if not? Well, you make due with what you've got and document what can go wrong.
But to get back on topic: making the info that ldmsd knows about sampling values available to the samplers would be very handy in some situations.
This should be an api extension that could be back-compatible with 4.3.
I don't believe that this has been implemented.
@morrone, @valleydlr we don't have 'drift' other than what results from the fact that multiple sampler.sample calls are made on the same thread. Please move this to 'discussion', but I will close this for now because we are using items to track milestone completion.
Drift was really a distraction from the main point of this ticket: samplers still do not (to the best of my knowledge) have any standard method to determine the sampling period. Only ldmsd itself knows that, and it isn't sharing that info with it's plugins. That need to be addressed.
@morrone, do the samplers need to know their sample intervals? I don't think so, and they don't control when their sample() function is called. This confusion is 15 years old and is the result of brutishly stuffing these values into a data structure that was shared with the plugin when sampler plugins were the only kind. Moving this to 2.5.1
don't think so, and they don't control when their sample()
I don't think there is any confusion here. Yes, some do need to know what the sample period is, even if they cannot change it. dcgm is one example. I don't see any good reason to keep this secret. And some things need it.
I do agree that it the wrong way to go to randomly throw things into a data structure that is shared with the plugin. A better way would be to have an accessor function that a sampler plugin can call. The function would take a parameter that is the handle of a sampler instance, and it would return the current sampling interval.
We might also need a callback that the sampler can register to be notified when the sampling period is changed.
Right now ldmsd is told what sampling frequency to use as an option to the "start" command. Some samplers would derive value from knowing the interval value. Right now (as far as I can tell), we would need the user to specify matching interval values on both the config and start lines.
One way that would be nice to handle this would be for ldmsd to look at the "interval" setting from the config command instead of using an interval setting on the start command. Then both ldmsd and the sampler would know the same value.
If that is too much of a change, adding an additional "start" call to the ldmsd sampler API would also provide a way to let the sampler know that it has been engaged, and pass start-time values.