Closed pschulten closed 1 year ago
hi, reload filed is enable_runtime_reload
server:
http_listen_port: 9080
grpc_listen_port: 0
enable_runtime_reload: true
https://grafana.com/docs/loki/latest/clients/promtail/configuration/
enable_runtime_reload
thanks, that works. sorry :(
Describe the bug
Start promtail version=2.9.0, branch=HEAD, revision=2feb64f69 in vm
Started promtail version=2.9.0, branch=HEAD, revision=2feb64f69 in vm config.yaml is
server:
http_listen_port: 9080
grpc_listen_port: 0
enable_runtime_reload: true
clients:
- url: http://loki.cpaas.wxchina.com:30669/loki/api/v1/push
positions:
filename: ./positions.yaml
target_config:
sync_period: 10s
scrape_configs:
- job_name: alert-log
static_configs:
- targets:
- localhost
labels:
node: liuxu-node
job: alert-log
app: alert-app
type: alert
__path__: /media/liuxu/data/component/promtail/alert.log
and modify config.yaml and save
server:
http_listen_port: 9080
grpc_listen_port: 0
enable_runtime_reload: true
clients:
- url: http://loki.cpaas.wxchina.com:30669/loki/api/v1/push
positions:
filename: ./positions.yaml
target_config:
sync_period: 10s
scrape_configs:
- job_name: alert-log
static_configs:
- targets:
- localhost
labels:
node: liuxu-node
job: alert-log
app: alert-app-liuxu #modify the app label from alert-app to alert-app-liuxu
type: alert
__path__: /media/liuxu/data/component/promtail/alert.log
execute curl reload
curl http://localhost:9080/reload
the error information: (I am in vm use vim modify the config.yaml and save)
panic: duplicate metrics collector registration attempted
goroutine 209 [running]: github.com/prometheus/client_golang/prometheus.(Registry).MustRegister(0x37fd4eb?, {0xc000560e40?, 0x1, 0xb?}) /drone/src/vendor/github.com/prometheus/client_golang/prometheus/registry.go:405 +0x85 github.com/grafana/loki/clients/pkg/promtail/wal.NewWatcherMetrics({0x40e50d0, 0xc0000c6a00}) /drone/src/clients/pkg/promtail/wal/watcher_metrics.go:73 +0xaac github.com/grafana/loki/clients/pkg/promtail/client.NewManager(0x0?, {0x40c8600, 0xc000136e10}, {0x40c3880000000000, 0x2710, 0x0, 0x1, 0x0, 0x0, 0x0}, ...) /drone/src/clients/pkg/promtail/client/manager.go:61 +0x8f github.com/grafana/loki/clients/pkg/promtail.(Promtail).reloadConfig(0xc000c1f2c0, 0xc0005f6000) /drone/src/clients/pkg/promtail/promtail.go:170 +0x8cb github.com/grafana/loki/clients/pkg/promtail.(Promtail).reload(0xc000c1f2c0) /drone/src/clients/pkg/promtail/promtail.go:286 +0xaf github.com/grafana/loki/clients/pkg/promtail.(Promtail).watchConfig(0xc000c1f2c0) /drone/src/clients/pkg/promtail/promtail.go:271 +0x3e9 created by github.com/grafana/loki/clients/pkg/promtail.(*Promtail).Run /drone/src/clients/pkg/promtail/promtail.go:214 +0xcc
是我的操作流程上有问题?
When I use promtail-2.7.0 the reload is ok!
the version detail information: version=HEAD-1b627d8, branch=HEAD, revision=1b627d880
The version : version=2.8.2, branch=HEAD, revision=9f809eda7 is OK!
The version: version=2.9.2, branch=HEAD, revision=a17308db6 is not OK!
The version: version=2.8.6, branch=HEAD, revision=990ac685e is OK!
I know the 2.9.x version's watch_metrics.go
func NewWatcherMetrics(reg prometheus.Registerer) *WatcherMetrics {
m := &WatcherMetrics{
recordsRead: prometheus.NewCounterVec(
prometheus.CounterOpts{
Namespace: "loki",
Subsystem: "wal_watcher",
Name: "records_read_total",
Help: "Number of records read by the WAL watcher from the WAL.",
},
[]string{"id"},
),
recordDecodeFails: prometheus.NewCounterVec(
prometheus.CounterOpts{
Namespace: "loki",
Subsystem: "wal_watcher",
Name: "record_decode_failures_total",
Help: "Number of records read by the WAL watcher that resulted in an error when decoding.",
},
[]string{"id"},
),
droppedWriteNotifications: prometheus.NewCounterVec(
prometheus.CounterOpts{
Namespace: "loki",
Subsystem: "wal_watcher",
Name: "dropped_write_notifications_total",
Help: "Number of dropped write notifications due to having one already buffered.",
},
[]string{"id"},
),
segmentRead: prometheus.NewCounterVec(
prometheus.CounterOpts{
Namespace: "loki",
Subsystem: "wal_watcher",
Name: "segment_read_total",
Help: "Number of segment reads triggered by the backup timer firing.",
},
[]string{"id", "reason"},
),
currentSegment: prometheus.NewGaugeVec(
prometheus.GaugeOpts{
Namespace: "loki",
Subsystem: "wal_watcher",
Name: "current_segment",
Help: "Current segment the WAL watcher is reading records from.",
},
[]string{"id"},
),
watchersRunning: prometheus.NewGaugeVec(
prometheus.GaugeOpts{
Namespace: "loki",
Subsystem: "wal_watcher",
Name: "running",
Help: "Number of WAL watchers running.",
},
nil,
),
}
if reg != nil {
reg.MustRegister(m.recordsRead)
reg.MustRegister(m.recordDecodeFails)
reg.MustRegister(m.droppedWriteNotifications)
reg.MustRegister(m.segmentRead)
reg.MustRegister(m.currentSegment)
reg.MustRegister(m.watchersRunning)
}
return m
}
but the main branch code is:
func NewWatcherMetrics(reg prometheus.Registerer) *WatcherMetrics {
m := &WatcherMetrics{
recordsRead: prometheus.NewCounterVec(
prometheus.CounterOpts{
Namespace: "loki",
Subsystem: "wal_watcher",
Name: "records_read_total",
Help: "Number of records read by the WAL watcher from the WAL.",
},
[]string{"id"},
),
recordDecodeFails: prometheus.NewCounterVec(
prometheus.CounterOpts{
Namespace: "loki",
Subsystem: "wal_watcher",
Name: "record_decode_failures_total",
Help: "Number of records read by the WAL watcher that resulted in an error when decoding.",
},
[]string{"id"},
),
droppedWriteNotifications: prometheus.NewCounterVec(
prometheus.CounterOpts{
Namespace: "loki",
Subsystem: "wal_watcher",
Name: "dropped_write_notifications_total",
Help: "Number of dropped write notifications due to having one already buffered.",
},
[]string{"id"},
),
segmentRead: prometheus.NewCounterVec(
prometheus.CounterOpts{
Namespace: "loki",
Subsystem: "wal_watcher",
Name: "segment_read_total",
Help: "Number of segment reads triggered by the backup timer firing.",
},
[]string{"id", "reason"},
),
currentSegment: prometheus.NewGaugeVec(
prometheus.GaugeOpts{
Namespace: "loki",
Subsystem: "wal_watcher",
Name: "current_segment",
Help: "Current segment the WAL watcher is reading records from.",
},
[]string{"id"},
),
watchersRunning: prometheus.NewGaugeVec(
prometheus.GaugeOpts{
Namespace: "loki",
Subsystem: "wal_watcher",
Name: "running",
Help: "Number of WAL watchers running.",
},
nil,
),
}
// Collectors will be re-registered to registry if it's got reloaded
// Reuse the old collectors instead of panicking out.
if reg != nil {
if err := reg.Register(m.recordsRead); err != nil {
are := &prometheus.AlreadyRegisteredError{}
if errors.As(err, are) {
m.recordsRead = are.ExistingCollector.(*prometheus.CounterVec)
}
}
if err := reg.Register(m.recordDecodeFails); err != nil {
are := &prometheus.AlreadyRegisteredError{}
if errors.As(err, are) {
m.recordDecodeFails = are.ExistingCollector.(*prometheus.CounterVec)
}
}
if err := reg.Register(m.droppedWriteNotifications); err != nil {
are := &prometheus.AlreadyRegisteredError{}
if errors.As(err, are) {
m.droppedWriteNotifications = are.ExistingCollector.(*prometheus.CounterVec)
}
}
if err := reg.Register(m.segmentRead); err != nil {
are := &prometheus.AlreadyRegisteredError{}
if errors.As(err, are) {
m.segmentRead = are.ExistingCollector.(*prometheus.CounterVec)
}
}
if err := reg.Register(m.currentSegment); err != nil {
are := &prometheus.AlreadyRegisteredError{}
if errors.As(err, are) {
m.currentSegment = are.ExistingCollector.(*prometheus.GaugeVec)
}
}
if err := reg.Register(m.watchersRunning); err != nil {
are := &prometheus.AlreadyRegisteredError{}
if errors.As(err, are) {
m.watchersRunning = are.ExistingCollector.(*prometheus.GaugeVec)
}
}
}
return m
}
How to resolve the problem is version v2.9.x reload error ? Waiting for a new version to be released? Waiting main branch merge to v2.9.3 version?
@liguozhong Help me!
@hainenber Hi you resolve the reload register metrics panic. and When will the new version be merged and released?
Describe the bug With 7247 added in 2.7.0 (https://github.com/grafana/loki/blob/d3111bcaa7749ca53902c45f4856e28e2980c79d/CHANGELOG.md?plain=1#LL99C3-L99C52) config reload should be possible but fails.
To Reproduce Steps to reproduce the behavior:
positions: filename: /tmp/positions.yaml
clients:
scrape_configs:
Expected behavior Above config is parsed correctly
Environment:
Screenshots, Promtail config, or terminal output
Maybe related: https://github.com/grafana/loki/issues/6388