pipe-cd / pipecd

The One CD for All {applications, platforms, operations}
https://pipecd.dev
Apache License 2.0
1.1k stars 154 forks source link

Piped restart due to failed when report stats to controlplane #4786

Closed khanhtc1202 closed 8 months ago

khanhtc1202 commented 9 months ago

What happened:

failed to report stats    {"error": "rpc error: code = ResourceExhausted desc = grpc: received message larger than max (4207624 vs. 4194304)"}
[github.com/pipe-cd/pipecd/pkg/app/piped/statsreporter.(*reporter).report](http://github.com/pipe-cd/pipecd/pkg/app/piped/statsreporter.(*reporter).report)
    /home/runner/work/pipecd/pipecd/pkg/app/piped/statsreporter/reporter.go:98
[github.com/pipe-cd/pipecd/pkg/app/piped/statsreporter.(*reporter).Run](http://github.com/pipe-cd/pipecd/pkg/app/piped/statsreporter.(*reporter).Run)
    /home/runner/work/pipecd/pipecd/pkg/app/piped/statsreporter/reporter.go:70
[github.com/pipe-cd/pipecd/pkg/app/piped/cmd/piped.(*piped).run.func5](http://github.com/pipe-cd/pipecd/pkg/app/piped/cmd/piped.(*piped).run.func5)
    /home/runner/work/pipecd/pipecd/pkg/app/piped/cmd/piped/piped.go:270
[golang.org/x/sync/errgroup.(*Group).Go.func1](http://golang.org/x/sync/errgroup.(*Group).Go.func1)
    /home/runner/go/pkg/mod/golang.org/x/sync@v0.1.0/errgroup/errgroup.go:75

What you expected to happen:

How to reproduce it:

Environment:

t-kikuc commented 8 months ago

"rpc error: code = ResourceExhausted desc = grpc: received message larger than max (4207624 vs. 4194304)"

Cause: The Control Plane's MaxRecvMsgSize of gRPC server is not enough for stats-report message from piped.

t-kikuc commented 8 months ago

Possible solutions:

If time allows, B is ideal for the future as the progobuf doc says like this:

Protocol Buffers are not designed to handle large messages. As a general rule of thumb, if you are dealing in messages larger than a megabyte each, it may be time to consider an alternate strategy. https://protobuf.dev/programming-guides/techniques/#large-data

t-kikuc commented 8 months ago

B) Split stats data to report and avoid the gRPC's size limitation.

I mean using Client Streaming of gRPC.

khanhtc1202 commented 8 months ago

@t-kikuc I understand that the response size is bigger than the current limit (possibly set as the default value of grpc without change). However, just because we're dealing with large data responses does not mean we immediately change the grpc type from unary to stream.

Could you share more about the investigation result that led to the B option choice in this case? If the response is bigger than the current limit but it doesn't have a change to increase, then just increasing the limit is a better choice, IMO.

t-kikuc commented 8 months ago

@khanhtc1202 I also think 'just increasing the limit' is simpler and better, so I'm implementing for that.

I'm just worried that the same error will happen if the message size will get larger than the new limit. Then operators of Control Plane needs to reboot it with a larger limit again (they don't know the safe limit value).

However, for now I think we should not convert to Client Stream to avoid complex implementation and breaking changes.

khanhtc1202 commented 8 months ago

Yes, that's why I said we need to know what is that big response actually. Is that increasing by time or it has a quite consistent size by time. Basically, it would be best if you can share here as part to explain why this issue has occurred. If nothing special, and the response data size quite consistent by time, then I want to go with the option A, but of course with the well know reason shared under this issue 🙏

khanhtc1202 commented 8 months ago

The point is to understand whether the new limit can be passed or not, we need to know what is that contains exactly 👀

t-kikuc commented 8 months ago

I investigated the real metrics of piped.

Conclusion: The point is that the number of deployment_status records increase as a new deployment starts.

what's included in statsreporter's metrics

According to https://pipecd.dev/docs-v0.46.x/user-guide/metrics/#piped-agent-metrics, the statsreporter sends 7 types of metrics:

  1. cloudprovider_kubernetes_tool_calls_total
  2. deployment_status
  3. livestatestore_kubernetes_api_requests_total
  4. livestatestore_kubernetes_resource_events_total
  5. plan_preview_command_handled_total
  6. plan_preview_command_handling_seconds
  7. plan_preview_command_received_total

real data

The below text is a real metrics of piped running locally (http://localhost:9085/metrics). (piped-id is masked)

# HELP deployment_status The current status of deployment. 1 for current status, 0 for others.
# TYPE deployment_status gauge
deployment_status{application_id="4dad17bb-ce09-4302-8ce6-36a3c29b9e15",application_kind="ECS",application_name="273-plan-preview",deployment="2342bae5-b625-49b1-8792-1945b1e704a2",launcher_version="",pipecd_component="piped",piped="a605f2ff-95af-4298-8c58-xxxxxxxxxxxx",piped_version="unspecified",platform_provider="ecs-dev",project="pipecd",status="DEPLOYMENT_CANCELLED"} 0
deployment_status{application_id="4dad17bb-ce09-4302-8ce6-36a3c29b9e15",application_kind="ECS",application_name="273-plan-preview",deployment="2342bae5-b625-49b1-8792-1945b1e704a2",launcher_version="",pipecd_component="piped",piped="a605f2ff-95af-4298-8c58-xxxxxxxxxxxx",piped_version="unspecified",platform_provider="ecs-dev",project="pipecd",status="DEPLOYMENT_FAILURE"} 0
deployment_status{application_id="4dad17bb-ce09-4302-8ce6-36a3c29b9e15",application_kind="ECS",application_name="273-plan-preview",deployment="2342bae5-b625-49b1-8792-1945b1e704a2",launcher_version="",pipecd_component="piped",piped="a605f2ff-95af-4298-8c58-xxxxxxxxxxxx",piped_version="unspecified",platform_provider="ecs-dev",project="pipecd",status="DEPLOYMENT_PENDING"} 0
deployment_status{application_id="4dad17bb-ce09-4302-8ce6-36a3c29b9e15",application_kind="ECS",application_name="273-plan-preview",deployment="2342bae5-b625-49b1-8792-1945b1e704a2",launcher_version="",pipecd_component="piped",piped="a605f2ff-95af-4298-8c58-xxxxxxxxxxxx",piped_version="unspecified",platform_provider="ecs-dev",project="pipecd",status="DEPLOYMENT_PLANNED"} 0
deployment_status{application_id="4dad17bb-ce09-4302-8ce6-36a3c29b9e15",application_kind="ECS",application_name="273-plan-preview",deployment="2342bae5-b625-49b1-8792-1945b1e704a2",launcher_version="",pipecd_component="piped",piped="a605f2ff-95af-4298-8c58-xxxxxxxxxxxx",piped_version="unspecified",platform_provider="ecs-dev",project="pipecd",status="DEPLOYMENT_ROLLING_BACK"} 0
deployment_status{application_id="4dad17bb-ce09-4302-8ce6-36a3c29b9e15",application_kind="ECS",application_name="273-plan-preview",deployment="2342bae5-b625-49b1-8792-1945b1e704a2",launcher_version="",pipecd_component="piped",piped="a605f2ff-95af-4298-8c58-xxxxxxxxxxxx",piped_version="unspecified",platform_provider="ecs-dev",project="pipecd",status="DEPLOYMENT_RUNNING"} 0
deployment_status{application_id="4dad17bb-ce09-4302-8ce6-36a3c29b9e15",application_kind="ECS",application_name="273-plan-preview",deployment="2342bae5-b625-49b1-8792-1945b1e704a2",launcher_version="",pipecd_component="piped",piped="a605f2ff-95af-4298-8c58-xxxxxxxxxxxx",piped_version="unspecified",platform_provider="ecs-dev",project="pipecd",status="DEPLOYMENT_SUCCESS"} 1
deployment_status{application_id="4dad17bb-ce09-4302-8ce6-36a3c29b9e15",application_kind="ECS",application_name="273-plan-preview",deployment="32271bc7-8266-445a-a35a-59c506c56b25",launcher_version="",pipecd_component="piped",piped="a605f2ff-95af-4298-8c58-xxxxxxxxxxxx",piped_version="unspecified",platform_provider="ecs-dev",project="pipecd",status="DEPLOYMENT_CANCELLED"} 0
deployment_status{application_id="4dad17bb-ce09-4302-8ce6-36a3c29b9e15",application_kind="ECS",application_name="273-plan-preview",deployment="32271bc7-8266-445a-a35a-59c506c56b25",launcher_version="",pipecd_component="piped",piped="a605f2ff-95af-4298-8c58-xxxxxxxxxxxx",piped_version="unspecified",platform_provider="ecs-dev",project="pipecd",status="DEPLOYMENT_FAILURE"} 0
deployment_status{application_id="4dad17bb-ce09-4302-8ce6-36a3c29b9e15",application_kind="ECS",application_name="273-plan-preview",deployment="32271bc7-8266-445a-a35a-59c506c56b25",launcher_version="",pipecd_component="piped",piped="a605f2ff-95af-4298-8c58-xxxxxxxxxxxx",piped_version="unspecified",platform_provider="ecs-dev",project="pipecd",status="DEPLOYMENT_PENDING"} 1
deployment_status{application_id="4dad17bb-ce09-4302-8ce6-36a3c29b9e15",application_kind="ECS",application_name="273-plan-preview",deployment="32271bc7-8266-445a-a35a-59c506c56b25",launcher_version="",pipecd_component="piped",piped="a605f2ff-95af-4298-8c58-xxxxxxxxxxxx",piped_version="unspecified",platform_provider="ecs-dev",project="pipecd",status="DEPLOYMENT_PLANNED"} 0
deployment_status{application_id="4dad17bb-ce09-4302-8ce6-36a3c29b9e15",application_kind="ECS",application_name="273-plan-preview",deployment="32271bc7-8266-445a-a35a-59c506c56b25",launcher_version="",pipecd_component="piped",piped="a605f2ff-95af-4298-8c58-xxxxxxxxxxxx",piped_version="unspecified",platform_provider="ecs-dev",project="pipecd",status="DEPLOYMENT_ROLLING_BACK"} 0
deployment_status{application_id="4dad17bb-ce09-4302-8ce6-36a3c29b9e15",application_kind="ECS",application_name="273-plan-preview",deployment="32271bc7-8266-445a-a35a-59c506c56b25",launcher_version="",pipecd_component="piped",piped="a605f2ff-95af-4298-8c58-xxxxxxxxxxxx",piped_version="unspecified",platform_provider="ecs-dev",project="pipecd",status="DEPLOYMENT_RUNNING"} 0
deployment_status{application_id="4dad17bb-ce09-4302-8ce6-36a3c29b9e15",application_kind="ECS",application_name="273-plan-preview",deployment="32271bc7-8266-445a-a35a-59c506c56b25",launcher_version="",pipecd_component="piped",piped="a605f2ff-95af-4298-8c58-xxxxxxxxxxxx",piped_version="unspecified",platform_provider="ecs-dev",project="pipecd",status="DEPLOYMENT_SUCCESS"} 0
deployment_status{application_id="5c796adc-ac7a-4b28-9e71-f1163103d3cd",application_kind="ECS",application_name="275-secret-management",deployment="6e94600c-9ca8-4979-a72c-b1860b17a04b",launcher_version="",pipecd_component="piped",piped="a605f2ff-95af-4298-8c58-xxxxxxxxxxxx",piped_version="unspecified",platform_provider="ecs-dev",project="pipecd",status="DEPLOYMENT_CANCELLED"} 0
deployment_status{application_id="5c796adc-ac7a-4b28-9e71-f1163103d3cd",application_kind="ECS",application_name="275-secret-management",deployment="6e94600c-9ca8-4979-a72c-b1860b17a04b",launcher_version="",pipecd_component="piped",piped="a605f2ff-95af-4298-8c58-xxxxxxxxxxxx",piped_version="unspecified",platform_provider="ecs-dev",project="pipecd",status="DEPLOYMENT_FAILURE"} 1
deployment_status{application_id="5c796adc-ac7a-4b28-9e71-f1163103d3cd",application_kind="ECS",application_name="275-secret-management",deployment="6e94600c-9ca8-4979-a72c-b1860b17a04b",launcher_version="",pipecd_component="piped",piped="a605f2ff-95af-4298-8c58-xxxxxxxxxxxx",piped_version="unspecified",platform_provider="ecs-dev",project="pipecd",status="DEPLOYMENT_PENDING"} 0
deployment_status{application_id="5c796adc-ac7a-4b28-9e71-f1163103d3cd",application_kind="ECS",application_name="275-secret-management",deployment="6e94600c-9ca8-4979-a72c-b1860b17a04b",launcher_version="",pipecd_component="piped",piped="a605f2ff-95af-4298-8c58-xxxxxxxxxxxx",piped_version="unspecified",platform_provider="ecs-dev",project="pipecd",status="DEPLOYMENT_PLANNED"} 0
deployment_status{application_id="5c796adc-ac7a-4b28-9e71-f1163103d3cd",application_kind="ECS",application_name="275-secret-management",deployment="6e94600c-9ca8-4979-a72c-b1860b17a04b",launcher_version="",pipecd_component="piped",piped="a605f2ff-95af-4298-8c58-xxxxxxxxxxxx",piped_version="unspecified",platform_provider="ecs-dev",project="pipecd",status="DEPLOYMENT_ROLLING_BACK"} 0
deployment_status{application_id="5c796adc-ac7a-4b28-9e71-f1163103d3cd",application_kind="ECS",application_name="275-secret-management",deployment="6e94600c-9ca8-4979-a72c-b1860b17a04b",launcher_version="",pipecd_component="piped",piped="a605f2ff-95af-4298-8c58-xxxxxxxxxxxx",piped_version="unspecified",platform_provider="ecs-dev",project="pipecd",status="DEPLOYMENT_RUNNING"} 0
deployment_status{application_id="5c796adc-ac7a-4b28-9e71-f1163103d3cd",application_kind="ECS",application_name="275-secret-management",deployment="6e94600c-9ca8-4979-a72c-b1860b17a04b",launcher_version="",pipecd_component="piped",piped="a605f2ff-95af-4298-8c58-xxxxxxxxxxxx",piped_version="unspecified",platform_provider="ecs-dev",project="pipecd",status="DEPLOYMENT_SUCCESS"} 0
# HELP go_gc_duration_seconds A summary of the pause duration of garbage collection cycles.
# TYPE go_gc_duration_seconds summary
go_gc_duration_seconds{launcher_version="",pipecd_component="piped",piped="a605f2ff-95af-4298-8c58-xxxxxxxxxxxx",piped_version="unspecified",project="pipecd",quantile="0"} 4.9416e-05
go_gc_duration_seconds{launcher_version="",pipecd_component="piped",piped="a605f2ff-95af-4298-8c58-xxxxxxxxxxxx",piped_version="unspecified",project="pipecd",quantile="0.25"} 0.000144083
go_gc_duration_seconds{launcher_version="",pipecd_component="piped",piped="a605f2ff-95af-4298-8c58-xxxxxxxxxxxx",piped_version="unspecified",project="pipecd",quantile="0.5"} 0.000173708
go_gc_duration_seconds{launcher_version="",pipecd_component="piped",piped="a605f2ff-95af-4298-8c58-xxxxxxxxxxxx",piped_version="unspecified",project="pipecd",quantile="0.75"} 0.000259125
go_gc_duration_seconds{launcher_version="",pipecd_component="piped",piped="a605f2ff-95af-4298-8c58-xxxxxxxxxxxx",piped_version="unspecified",project="pipecd",quantile="1"} 0.0005105
go_gc_duration_seconds_sum{launcher_version="",pipecd_component="piped",piped="a605f2ff-95af-4298-8c58-xxxxxxxxxxxx",piped_version="unspecified",project="pipecd"} 0.002285667
go_gc_duration_seconds_count{launcher_version="",pipecd_component="piped",piped="a605f2ff-95af-4298-8c58-xxxxxxxxxxxx",piped_version="unspecified",project="pipecd"} 11
# HELP go_goroutines Number of goroutines that currently exist.
# TYPE go_goroutines gauge
go_goroutines{launcher_version="",pipecd_component="piped",piped="a605f2ff-95af-4298-8c58-xxxxxxxxxxxx",piped_version="unspecified",project="pipecd"} 41
# HELP go_info Information about the Go environment.
# TYPE go_info gauge
go_info{launcher_version="",pipecd_component="piped",piped="a605f2ff-95af-4298-8c58-xxxxxxxxxxxx",piped_version="unspecified",project="pipecd",version="go1.21.3"} 1
# HELP go_memstats_alloc_bytes Number of bytes allocated and still in use.
# TYPE go_memstats_alloc_bytes gauge
go_memstats_alloc_bytes{launcher_version="",pipecd_component="piped",piped="a605f2ff-95af-4298-8c58-xxxxxxxxxxxx",piped_version="unspecified",project="pipecd"} 1.075928e+07
# HELP go_memstats_alloc_bytes_total Total number of bytes allocated, even if freed.
# TYPE go_memstats_alloc_bytes_total counter
go_memstats_alloc_bytes_total{launcher_version="",pipecd_component="piped",piped="a605f2ff-95af-4298-8c58-xxxxxxxxxxxx",piped_version="unspecified",project="pipecd"} 5.4327328e+07
# HELP go_memstats_buck_hash_sys_bytes Number of bytes used by the profiling bucket hash table.
# TYPE go_memstats_buck_hash_sys_bytes gauge
go_memstats_buck_hash_sys_bytes{launcher_version="",pipecd_component="piped",piped="a605f2ff-95af-4298-8c58-xxxxxxxxxxxx",piped_version="unspecified",project="pipecd"} 1.473532e+06
# HELP go_memstats_frees_total Total number of frees.
# TYPE go_memstats_frees_total counter
go_memstats_frees_total{launcher_version="",pipecd_component="piped",piped="a605f2ff-95af-4298-8c58-xxxxxxxxxxxx",piped_version="unspecified",project="pipecd"} 427160
# HELP go_memstats_gc_cpu_fraction The fraction of this program's available CPU time used by the GC since the program started.
# TYPE go_memstats_gc_cpu_fraction gauge
go_memstats_gc_cpu_fraction{launcher_version="",pipecd_component="piped",piped="a605f2ff-95af-4298-8c58-xxxxxxxxxxxx",piped_version="unspecified",project="pipecd"} 1.1767788253792074e-05
# HELP go_memstats_gc_sys_bytes Number of bytes used for garbage collection system metadata.
# TYPE go_memstats_gc_sys_bytes gauge
go_memstats_gc_sys_bytes{launcher_version="",pipecd_component="piped",piped="a605f2ff-95af-4298-8c58-xxxxxxxxxxxx",piped_version="unspecified",project="pipecd"} 4.982864e+06
# HELP go_memstats_heap_alloc_bytes Number of heap bytes allocated and still in use.
# TYPE go_memstats_heap_alloc_bytes gauge
go_memstats_heap_alloc_bytes{launcher_version="",pipecd_component="piped",piped="a605f2ff-95af-4298-8c58-xxxxxxxxxxxx",piped_version="unspecified",project="pipecd"} 1.075928e+07
# HELP go_memstats_heap_idle_bytes Number of heap bytes waiting to be used.
# TYPE go_memstats_heap_idle_bytes gauge
go_memstats_heap_idle_bytes{launcher_version="",pipecd_component="piped",piped="a605f2ff-95af-4298-8c58-xxxxxxxxxxxx",piped_version="unspecified",project="pipecd"} 1.0018816e+07
# HELP go_memstats_heap_inuse_bytes Number of heap bytes that are in use.
# TYPE go_memstats_heap_inuse_bytes gauge
go_memstats_heap_inuse_bytes{launcher_version="",pipecd_component="piped",piped="a605f2ff-95af-4298-8c58-xxxxxxxxxxxx",piped_version="unspecified",project="pipecd"} 1.3934592e+07
# HELP go_memstats_heap_objects Number of allocated objects.
# TYPE go_memstats_heap_objects gauge
go_memstats_heap_objects{launcher_version="",pipecd_component="piped",piped="a605f2ff-95af-4298-8c58-xxxxxxxxxxxx",piped_version="unspecified",project="pipecd"} 63059
# HELP go_memstats_heap_released_bytes Number of heap bytes released to OS.
# TYPE go_memstats_heap_released_bytes gauge
go_memstats_heap_released_bytes{launcher_version="",pipecd_component="piped",piped="a605f2ff-95af-4298-8c58-xxxxxxxxxxxx",piped_version="unspecified",project="pipecd"} 2.768896e+06
# HELP go_memstats_heap_sys_bytes Number of heap bytes obtained from system.
# TYPE go_memstats_heap_sys_bytes gauge
go_memstats_heap_sys_bytes{launcher_version="",pipecd_component="piped",piped="a605f2ff-95af-4298-8c58-xxxxxxxxxxxx",piped_version="unspecified",project="pipecd"} 2.3953408e+07
# HELP go_memstats_last_gc_time_seconds Number of seconds since 1970 of last garbage collection.
# TYPE go_memstats_last_gc_time_seconds gauge
go_memstats_last_gc_time_seconds{launcher_version="",pipecd_component="piped",piped="a605f2ff-95af-4298-8c58-xxxxxxxxxxxx",piped_version="unspecified",project="pipecd"} 1.712053982471614e+09
# HELP go_memstats_lookups_total Total number of pointer lookups.
# TYPE go_memstats_lookups_total counter
go_memstats_lookups_total{launcher_version="",pipecd_component="piped",piped="a605f2ff-95af-4298-8c58-xxxxxxxxxxxx",piped_version="unspecified",project="pipecd"} 0
# HELP go_memstats_mallocs_total Total number of mallocs.
# TYPE go_memstats_mallocs_total counter
go_memstats_mallocs_total{launcher_version="",pipecd_component="piped",piped="a605f2ff-95af-4298-8c58-xxxxxxxxxxxx",piped_version="unspecified",project="pipecd"} 490219
# HELP go_memstats_mcache_inuse_bytes Number of bytes in use by mcache structures.
# TYPE go_memstats_mcache_inuse_bytes gauge
go_memstats_mcache_inuse_bytes{launcher_version="",pipecd_component="piped",piped="a605f2ff-95af-4298-8c58-xxxxxxxxxxxx",piped_version="unspecified",project="pipecd"} 14400
# HELP go_memstats_mcache_sys_bytes Number of bytes used for mcache structures obtained from system.
# TYPE go_memstats_mcache_sys_bytes gauge
go_memstats_mcache_sys_bytes{launcher_version="",pipecd_component="piped",piped="a605f2ff-95af-4298-8c58-xxxxxxxxxxxx",piped_version="unspecified",project="pipecd"} 15600
# HELP go_memstats_mspan_inuse_bytes Number of bytes in use by mspan structures.
# TYPE go_memstats_mspan_inuse_bytes gauge
go_memstats_mspan_inuse_bytes{launcher_version="",pipecd_component="piped",piped="a605f2ff-95af-4298-8c58-xxxxxxxxxxxx",piped_version="unspecified",project="pipecd"} 295176
# HELP go_memstats_mspan_sys_bytes Number of bytes used for mspan structures obtained from system.
# TYPE go_memstats_mspan_sys_bytes gauge
go_memstats_mspan_sys_bytes{launcher_version="",pipecd_component="piped",piped="a605f2ff-95af-4298-8c58-xxxxxxxxxxxx",piped_version="unspecified",project="pipecd"} 358512
# HELP go_memstats_next_gc_bytes Number of heap bytes when next garbage collection will take place.
# TYPE go_memstats_next_gc_bytes gauge
go_memstats_next_gc_bytes{launcher_version="",pipecd_component="piped",piped="a605f2ff-95af-4298-8c58-xxxxxxxxxxxx",piped_version="unspecified",project="pipecd"} 1.8034312e+07
# HELP go_memstats_other_sys_bytes Number of bytes used for other system allocations.
# TYPE go_memstats_other_sys_bytes gauge
go_memstats_other_sys_bytes{launcher_version="",pipecd_component="piped",piped="a605f2ff-95af-4298-8c58-xxxxxxxxxxxx",piped_version="unspecified",project="pipecd"} 1.985372e+06
# HELP go_memstats_stack_inuse_bytes Number of bytes in use by the stack allocator.
# TYPE go_memstats_stack_inuse_bytes gauge
go_memstats_stack_inuse_bytes{launcher_version="",pipecd_component="piped",piped="a605f2ff-95af-4298-8c58-xxxxxxxxxxxx",piped_version="unspecified",project="pipecd"} 1.212416e+06
# HELP go_memstats_stack_sys_bytes Number of bytes obtained from system for stack allocator.
# TYPE go_memstats_stack_sys_bytes gauge
go_memstats_stack_sys_bytes{launcher_version="",pipecd_component="piped",piped="a605f2ff-95af-4298-8c58-xxxxxxxxxxxx",piped_version="unspecified",project="pipecd"} 1.212416e+06
# HELP go_memstats_sys_bytes Number of bytes obtained from system.
# TYPE go_memstats_sys_bytes gauge
go_memstats_sys_bytes{launcher_version="",pipecd_component="piped",piped="a605f2ff-95af-4298-8c58-xxxxxxxxxxxx",piped_version="unspecified",project="pipecd"} 3.3981704e+07
# HELP go_threads Number of OS threads created.
# TYPE go_threads gauge
go_threads{launcher_version="",pipecd_component="piped",piped="a605f2ff-95af-4298-8c58-xxxxxxxxxxxx",piped_version="unspecified",project="pipecd"} 18
# HELP plan_preview_command_received_total Total number of plan-preview commands received at piped.
# TYPE plan_preview_command_received_total counter
plan_preview_command_received_total{launcher_version="",pipecd_component="piped",piped="a605f2ff-95af-4298-8c58-xxxxxxxxxxxx",piped_version="unspecified",project="pipecd"} 0

The point is that deployment_status records increase by 7 lines as a new deployment starts, even if the new deployment starts, old recoreds remain. To see clearer, I split the deployment_status section by application&deployment:


# application_id="4dad17bb-ce09-4302-8ce6-36a3c29b9e15", deployment="2342bae5-b625-49b1-8792-1945b1e704a2"
deployment_status{application_id="4dad17bb-ce09-4302-8ce6-36a3c29b9e15",application_kind="ECS",application_name="273-plan-preview",deployment="2342bae5-b625-49b1-8792-1945b1e704a2",launcher_version="",pipecd_component="piped",piped="a605f2ff-95af-4298-8c58-xxxxxxxxxxxx",piped_version="unspecified",platform_provider="ecs-dev",project="pipecd",status="DEPLOYMENT_CANCELLED"} 0
deployment_status{application_id="4dad17bb-ce09-4302-8ce6-36a3c29b9e15",application_kind="ECS",application_name="273-plan-preview",deployment="2342bae5-b625-49b1-8792-1945b1e704a2",launcher_version="",pipecd_component="piped",piped="a605f2ff-95af-4298-8c58-xxxxxxxxxxxx",piped_version="unspecified",platform_provider="ecs-dev",project="pipecd",status="DEPLOYMENT_FAILURE"} 0
deployment_status{application_id="4dad17bb-ce09-4302-8ce6-36a3c29b9e15",application_kind="ECS",application_name="273-plan-preview",deployment="2342bae5-b625-49b1-8792-1945b1e704a2",launcher_version="",pipecd_component="piped",piped="a605f2ff-95af-4298-8c58-xxxxxxxxxxxx",piped_version="unspecified",platform_provider="ecs-dev",project="pipecd",status="DEPLOYMENT_PENDING"} 0
deployment_status{application_id="4dad17bb-ce09-4302-8ce6-36a3c29b9e15",application_kind="ECS",application_name="273-plan-preview",deployment="2342bae5-b625-49b1-8792-1945b1e704a2",launcher_version="",pipecd_component="piped",piped="a605f2ff-95af-4298-8c58-xxxxxxxxxxxx",piped_version="unspecified",platform_provider="ecs-dev",project="pipecd",status="DEPLOYMENT_PLANNED"} 0
deployment_status{application_id="4dad17bb-ce09-4302-8ce6-36a3c29b9e15",application_kind="ECS",application_name="273-plan-preview",deployment="2342bae5-b625-49b1-8792-1945b1e704a2",launcher_version="",pipecd_component="piped",piped="a605f2ff-95af-4298-8c58-xxxxxxxxxxxx",piped_version="unspecified",platform_provider="ecs-dev",project="pipecd",status="DEPLOYMENT_ROLLING_BACK"} 0
deployment_status{application_id="4dad17bb-ce09-4302-8ce6-36a3c29b9e15",application_kind="ECS",application_name="273-plan-preview",deployment="2342bae5-b625-49b1-8792-1945b1e704a2",launcher_version="",pipecd_component="piped",piped="a605f2ff-95af-4298-8c58-xxxxxxxxxxxx",piped_version="unspecified",platform_provider="ecs-dev",project="pipecd",status="DEPLOYMENT_RUNNING"} 0
deployment_status{application_id="4dad17bb-ce09-4302-8ce6-36a3c29b9e15",application_kind="ECS",application_name="273-plan-preview",deployment="2342bae5-b625-49b1-8792-1945b1e704a2",launcher_version="",pipecd_component="piped",piped="a605f2ff-95af-4298-8c58-xxxxxxxxxxxx",piped_version="unspecified",platform_provider="ecs-dev",project="pipecd",status="DEPLOYMENT_SUCCESS"} 1

# application_id="4dad17bb-ce09-4302-8ce6-36a3c29b9e15", deployment="32271bc7-8266-445a-a35a-59c506c56b25"
deployment_status{application_id="4dad17bb-ce09-4302-8ce6-36a3c29b9e15",application_kind="ECS",application_name="273-plan-preview",deployment="32271bc7-8266-445a-a35a-59c506c56b25",launcher_version="",pipecd_component="piped",piped="a605f2ff-95af-4298-8c58-xxxxxxxxxxxx",piped_version="unspecified",platform_provider="ecs-dev",project="pipecd",status="DEPLOYMENT_CANCELLED"} 0
deployment_status{application_id="4dad17bb-ce09-4302-8ce6-36a3c29b9e15",application_kind="ECS",application_name="273-plan-preview",deployment="32271bc7-8266-445a-a35a-59c506c56b25",launcher_version="",pipecd_component="piped",piped="a605f2ff-95af-4298-8c58-xxxxxxxxxxxx",piped_version="unspecified",platform_provider="ecs-dev",project="pipecd",status="DEPLOYMENT_FAILURE"} 0
deployment_status{application_id="4dad17bb-ce09-4302-8ce6-36a3c29b9e15",application_kind="ECS",application_name="273-plan-preview",deployment="32271bc7-8266-445a-a35a-59c506c56b25",launcher_version="",pipecd_component="piped",piped="a605f2ff-95af-4298-8c58-xxxxxxxxxxxx",piped_version="unspecified",platform_provider="ecs-dev",project="pipecd",status="DEPLOYMENT_PENDING"} 1
deployment_status{application_id="4dad17bb-ce09-4302-8ce6-36a3c29b9e15",application_kind="ECS",application_name="273-plan-preview",deployment="32271bc7-8266-445a-a35a-59c506c56b25",launcher_version="",pipecd_component="piped",piped="a605f2ff-95af-4298-8c58-xxxxxxxxxxxx",piped_version="unspecified",platform_provider="ecs-dev",project="pipecd",status="DEPLOYMENT_PLANNED"} 0
deployment_status{application_id="4dad17bb-ce09-4302-8ce6-36a3c29b9e15",application_kind="ECS",application_name="273-plan-preview",deployment="32271bc7-8266-445a-a35a-59c506c56b25",launcher_version="",pipecd_component="piped",piped="a605f2ff-95af-4298-8c58-xxxxxxxxxxxx",piped_version="unspecified",platform_provider="ecs-dev",project="pipecd",status="DEPLOYMENT_ROLLING_BACK"} 0
deployment_status{application_id="4dad17bb-ce09-4302-8ce6-36a3c29b9e15",application_kind="ECS",application_name="273-plan-preview",deployment="32271bc7-8266-445a-a35a-59c506c56b25",launcher_version="",pipecd_component="piped",piped="a605f2ff-95af-4298-8c58-xxxxxxxxxxxx",piped_version="unspecified",platform_provider="ecs-dev",project="pipecd",status="DEPLOYMENT_RUNNING"} 0
deployment_status{application_id="4dad17bb-ce09-4302-8ce6-36a3c29b9e15",application_kind="ECS",application_name="273-plan-preview",deployment="32271bc7-8266-445a-a35a-59c506c56b25",launcher_version="",pipecd_component="piped",piped="a605f2ff-95af-4298-8c58-xxxxxxxxxxxx",piped_version="unspecified",platform_provider="ecs-dev",project="pipecd",status="DEPLOYMENT_SUCCESS"} 0

# application_id="5c796adc-ac7a-4b28-9e71-f1163103d3cd", deployment="6e94600c-9ca8-4979-a72c-b1860b17a04b"
deployment_status{application_id="5c796adc-ac7a-4b28-9e71-f1163103d3cd",application_kind="ECS",application_name="275-secret-management",deployment="6e94600c-9ca8-4979-a72c-b1860b17a04b",launcher_version="",pipecd_component="piped",piped="a605f2ff-95af-4298-8c58-xxxxxxxxxxxx",piped_version="unspecified",platform_provider="ecs-dev",project="pipecd",status="DEPLOYMENT_CANCELLED"} 0
deployment_status{application_id="5c796adc-ac7a-4b28-9e71-f1163103d3cd",application_kind="ECS",application_name="275-secret-management",deployment="6e94600c-9ca8-4979-a72c-b1860b17a04b",launcher_version="",pipecd_component="piped",piped="a605f2ff-95af-4298-8c58-xxxxxxxxxxxx",piped_version="unspecified",platform_provider="ecs-dev",project="pipecd",status="DEPLOYMENT_FAILURE"} 1
deployment_status{application_id="5c796adc-ac7a-4b28-9e71-f1163103d3cd",application_kind="ECS",application_name="275-secret-management",deployment="6e94600c-9ca8-4979-a72c-b1860b17a04b",launcher_version="",pipecd_component="piped",piped="a605f2ff-95af-4298-8c58-xxxxxxxxxxxx",piped_version="unspecified",platform_provider="ecs-dev",project="pipecd",status="DEPLOYMENT_PENDING"} 0
deployment_status{application_id="5c796adc-ac7a-4b28-9e71-f1163103d3cd",application_kind="ECS",application_name="275-secret-management",deployment="6e94600c-9ca8-4979-a72c-b1860b17a04b",launcher_version="",pipecd_component="piped",piped="a605f2ff-95af-4298-8c58-xxxxxxxxxxxx",piped_version="unspecified",platform_provider="ecs-dev",project="pipecd",status="DEPLOYMENT_PLANNED"} 0
deployment_status{application_id="5c796adc-ac7a-4b28-9e71-f1163103d3cd",application_kind="ECS",application_name="275-secret-management",deployment="6e94600c-9ca8-4979-a72c-b1860b17a04b",launcher_version="",pipecd_component="piped",piped="a605f2ff-95af-4298-8c58-xxxxxxxxxxxx",piped_version="unspecified",platform_provider="ecs-dev",project="pipecd",status="DEPLOYMENT_ROLLING_BACK"} 0
deployment_status{application_id="5c796adc-ac7a-4b28-9e71-f1163103d3cd",application_kind="ECS",application_name="275-secret-management",deployment="6e94600c-9ca8-4979-a72c-b1860b17a04b",launcher_version="",pipecd_component="piped",piped="a605f2ff-95af-4298-8c58-xxxxxxxxxxxx",piped_version="unspecified",platform_provider="ecs-dev",project="pipecd",status="DEPLOYMENT_RUNNING"} 0
deployment_status{application_id="5c796adc-ac7a-4b28-9e71-f1163103d3cd",application_kind="ECS",application_name="275-secret-management",deployment="6e94600c-9ca8-4979-a72c-b1860b17a04b",launcher_version="",pipecd_component="piped",piped="a605f2ff-95af-4298-8c58-xxxxxxxxxxxx",piped_version="unspecified",platform_provider="ecs-dev",project="pipecd",status="DEPLOYMENT_SUCCESS"} 0
t-kikuc commented 8 months ago

Besides, old deployment_status records are removed after the piped is restarted.

That's why restarting the piped solved the problem...

t-kikuc commented 8 months ago

This is why I implemented like https://github.com/pipe-cd/pipecd/pull/4857.

There are 4 candidates, and I pick up 4. as the solution.

  1. Set TTL(e.g. 24h) for deployment_status in some way -> [Impossible/Difficult] I could not find such options in Prometheus. And it would be tough to implement TTL by ourselves.

  2. Delete old deployment_status after the deployment finished -> [Impossible] We must report deployment_status at least once before deleting it.

  3. Delete old deployment_status when new deployment for the same app starts -> [Impossible] It is difficult to get the previous deployment-id to delete when the new one starts.

  4. Delete all deployment_status right after reporting stats -> [Possible&Easy] Only deployment_status of deploying deployments remain. Piped can report each deployment_status at least once.
    We don't need to get the previous deloyment-id. Prometheus's library has such func to reset: func (m *MetricVec) Reset()