Open lidel opened 1 year ago
I was unable to reproduce this with vanilla Docker image for 0.20.0,
but I can reproduce it every time with my own repo by asking for a CID without any providers or setting /etc/hosts
to use 127.0.0.1
for cid.contact
so it always fails:
An error has occurred while serving metrics:
2 error(s) occurred:
* collected metric "routing_http_client_latency" { label:<name:"code" value:"0" > label:<name:"error" value:"Net" > label:<name:"host" value:"cid.contact" > label:<name:"operation" value:"FindProviders" > histogram:<sample_count:36 sample_sum:0 bucket:<cumulative_count:36 upper_bound:1 > bucket:<cumulative_count:36 upper_bound:2 > bucket:<cumulative_count:36 upper_bound:5 > bucket:<cumulative_count:36 upper_bound:10 > bucket:<cumulative_count:36 upper_bound:20 > bucket:<cumulative_count:36 upper_bound:50 > bucket:<cumulative_count:36 upper_bound:100 > bucket:<cumulative_count:36 upper_bound:200 > bucket:<cumulative_count:36 upper_bound:500 > bucket:<cumulative_count:36 upper_bound:1000 > bucket:<cumulative_count:36 upper_bound:2000 > bucket:<cumulative_count:36 upper_bound:5000 > bucket:<cumulative_count:36 upper_bound:10000 > bucket:<cumulative_count:36 upper_bound:20000 > > } was collected before with the same name and label values
* collected metric "routing_http_client_latency" { label:<name:"code" value:"0" > label:<name:"error" value:"Net" > label:<name:"host" value:"cid.contact" > label:<name:"operation" value:"FindProviders" > histogram:<sample_count:36 sample_sum:0 bucket:<cumulative_count:36 upper_bound:1 > bucket:<cumulative_count:36 upper_bound:2 > bucket:<cumulative_count:36 upper_bound:5 > bucket:<cumulative_count:36 upper_bound:10 > bucket:<cumulative_count:36 upper_bound:20 > bucket:<cumulative_count:36 upper_bound:50 > bucket:<cumulative_count:36 upper_bound:100 > bucket:<cumulative_count:36 upper_bound:200 > bucket:<cumulative_count:36 upper_bound:500 > bucket:<cumulative_count:36 upper_bound:1000 > bucket:<cumulative_count:36 upper_bound:2000 > bucket:<cumulative_count:36 upper_bound:5000 > bucket:<cumulative_count:36 upper_bound:10000 > bucket:<cumulative_count:36 upper_bound:20000 > > } was collected before with the same name and label values
Will try to poke more, want to confirm if this is or is not something that impacts ipfs.io bifrost-infra.
bifrost-infra box running 0.20 did not experience this, which was a good sign.
I've focused on the differences in config, and the problem only occurs if RPC API is exposed on more than one port. Each time we expose a listener, it re-register metrics, which then creates the surface for this bug.
TLDR:
Addresses.API
Hello,
We are getting this error on our two nodes as well (running Kubo 0.24.0), but we don't have multiple listener setup:
"Addresses": {
"API": "/ip4/127.0.0.1/tcp/5001",
}
When I restart kubo, there is a few seconds during which the metrics are printed, and then this error shows up.
Anything I can provide to help debug this?
Same situation with Kubo 0.25 (I did not yet tried 0.26 since I'm waiting for it to be released in the unstable branch of NixOS).
This can be reproduced in v0.29.0 by:
ipfs init
edit ~/.ipfs/config
"Addresses": {
"API": ["/ip4/127.0.0.1/tcp/5001", "/ip6/::1/tcp/5001"],
ipfs daemon
echo hello | ipfs add
open http://127.0.0.1:5001/debug/metrics/prometheus
Version
Config
Description
http://127.0.0.1:5001/debug/metrics/prometheus ends up in broken state after a multiple days of uptime:
Seems we have a bug in
boxo/routing/http/client
, needs analysis.