Closed tomershafir closed 1 year ago
I dont necessarily agree on skewing but this may be useful for other reasons.
We have a ProfilingScheduler interface and default ContinuousProfilingScheduler implementation. If you only want this scheduler only for your services, I highly recommend implement it on your side(not in the library)
If you want this scheduler to be in the library : There's another community submited scheduler SamplingProfilingScheduler which is disabled by default. It should go there, it preserve capabilities it currently has. Bear in mind it is experimental, we do not commit supporting it, it and may be changed or removed or broken in the future.
indeed, it may also be useful to manage multi-event aspects like overhead.
Btw, I think SamplingProfilingScheduler should definitely be supported, I think it is the most useful in real productions.
@korniltsev why not merge the 2 schedulers?
why not merge the 2 schedulers?
сс @Rperry2174
Continuous profiling is inherently sampling based, so even without sleep() you miss. Also, if you try to configure no sleep policy to match sleep based one by distributing the same load over larger time window to manage overhead, the probability to catch steady state paths of first option is lower than the second.
You should decide if a feature is valuable. If so, I think you should do eventual consistency, rather than block a sdk. I also dont see such strict consistency in the oss wild
Concurrent multi-event collection can interfere each other and skew the results, for example, a bad case, where an app in steady state allocates and computes correspondingly with each other with respect to sampling config, threads performing tlab sampling may be captured by cpu profiler, skewing the flame graph.
A solution may be a fixed wall time window mechanism. Then, you can for example on a cpu-bound app sample cpu 1m then sample tlab 1m then sleep 3m. I may be able to help if you think its useful