wavesplatform / Waves

⛓️ Reference Waves Blockchain Node (client) implementation on Scala
https://wavesplatform.com/
MIT License
1.17k stars 418 forks source link

kamon.metric.Timer warns about negative values due to NTP time sync #3480

Open deemru opened 3 years ago

deemru commented 3 years ago

We still (on the newest node 1.3.4) get errors like these: WARN [ecution-context-global-XX] kamon.metric.Timer - Failed to record value [-70000000] on [span.processing-time,{operation=block-appender,error=false}] because the value is outside of the configured range. The recorded value was adjusted to the highest trackable value [3600000000000]. You might need to change your dynamic range configuration for this metric

This is not just annoying to see (was reported many times for years) but this also means the code executes and spends CPU cycles on non active functionality:

kamon.enable = no
kamon.modules.influxdb-span-reporter.enabled = no
metrics.enable = no

Because metrics calls (as far as i understand) are unconditional (for example just search for measureSuccessful)

Please, for god sake, make these metrics completely turned off by metrics.enable = no and i mean off = check config once and don`t waste CPU cycles anymore on that.

phearnot commented 3 years ago

Hi @deemru, thanks for reaching out. The root cause of an issue at hand is not actually the metrics subsystem, it's NTP. Note that the value which did not fit in the range is negative: it's because the time has been adjusted during block application. This is most likely due to the fact that the pool.ntp.org name has been resolved to an address of another server, which is not quite in sync with the one used during the preceding time sync attempt. To work around this issue, you can specify an exact address of your preferred NTP server in the waves.ntp-server parameter.

Unfortunately, as stated in kamon-io/Kamon#585, Kamon is built in such way that it's not supposed to be turned off. Metrics collection (even in hot code paths) does not have measurable impact on the node's overall performance (please share your benchmark data if your experience differs from what we've measured). What may impact performance or resource consumption are reporters which deliver aggregated metrics to storage backends like InfluxDB. Those are disabled by default.

If you prefer not to see metrics-related messages in your logs, you can add the following line to your custom logback.xml:

<logger name="kamon" level="OFF"/>
deemru commented 3 years ago

Can you please explain to me how this code which should do many many things to complete is better than if you just add before it simple condition like if( metrics.enabled ) do stuff?

https://github.com/wavesplatform/Waves/blob/7f83b021c461324a6476bde88e72937dce3193b9/node/src/main/scala/com/wavesplatform/state/appender/MicroblockAppender.scala#L28:

phearnot commented 3 years ago

As I've said, «doing many things to complete» does not take that long, so there's no point in optimizing this piece, because these optimizations will not yield measurable block application speedup.

deemru commented 3 years ago

For me your approach to resolve this issue is look like:

phearnot commented 3 years ago

I'm sorry, I can't see how enabling/disabling miner or REST API is related to the original issue. If you are concerned with API or miner performance impact on import speed or ordinary block application, please consider opening another issue.

We will consider tweaking measureSuccessful to ignore negative values so that there won't be any confusing warnings in the logs, in the meantime please use a workaround.

deemru commented 3 years ago

Metrics metrics.enable = no as another enable-like features must be completely disabled its functionality because it is set to enable = no not because it fast or not.

In our world of metrics, telemetry etc. An user should be sure he can opt-out of these things.

phearnot commented 3 years ago

No reporters are enabled by default, so nothing is shared with third party, unless explicitly configured. I believe this fits the definition of opt-in.

I'm locking this issue, because the root cause has been identified and action plan has been determined. Please feel free to open a separate issue if you feel so inclined.