Closed atilsensalduz closed 11 months ago
Hi @atilsensalduz 👋
The cloudprovider_aws_api_request_duration_seconds_bucket
metric is useful when measuring the latency (time it takes for AWS to acknowledge the request) of AWS API calls made by the driver, such as AttachVolume
. The latency does not account for the full lifecycle of the operation - after the request is ack'd, it will take some amount of time for the volume to transition to attached
and so on. See https://docs.aws.amazon.com/AWSEC2/latest/APIReference/query-api-troubleshooting.html#eventual-consistency for more details.
To accurately measure the time it takes to create a volume, you'll want to look at csi_sidecar_operations_seconds_sum
, example:
csi_sidecar_operations_seconds_bucket{driver_name="ebs.csi.aws.com",grpc_status_code="OK",method_name="/csi.v1.Controller/CreateVolume",migrated="false",le="0.1"} 0
csi_sidecar_operations_seconds_bucket{driver_name="ebs.csi.aws.com",grpc_status_code="OK",method_name="/csi.v1.Controller/CreateVolume",migrated="false",le="0.25"} 0
csi_sidecar_operations_seconds_bucket{driver_name="ebs.csi.aws.com",grpc_status_code="OK",method_name="/csi.v1.Controller/CreateVolume",migrated="false",le="0.5"} 0
csi_sidecar_operations_seconds_bucket{driver_name="ebs.csi.aws.com",grpc_status_code="OK",method_name="/csi.v1.Controller/CreateVolume",migrated="false",le="1"} 0
csi_sidecar_operations_seconds_bucket{driver_name="ebs.csi.aws.com",grpc_status_code="OK",method_name="/csi.v1.Controller/CreateVolume",migrated="false",le="2.5"} 0
csi_sidecar_operations_seconds_bucket{driver_name="ebs.csi.aws.com",grpc_status_code="OK",method_name="/csi.v1.Controller/CreateVolume",migrated="false",le="5"} 1
csi_sidecar_operations_seconds_bucket{driver_name="ebs.csi.aws.com",grpc_status_code="OK",method_name="/csi.v1.Controller/CreateVolume",migrated="false",le="10"} 1
csi_sidecar_operations_seconds_bucket{driver_name="ebs.csi.aws.com",grpc_status_code="OK",method_name="/csi.v1.Controller/CreateVolume",migrated="false",le="15"} 1
csi_sidecar_operations_seconds_bucket{driver_name="ebs.csi.aws.com",grpc_status_code="OK",method_name="/csi.v1.Controller/CreateVolume",migrated="false",le="25"} 1
csi_sidecar_operations_seconds_bucket{driver_name="ebs.csi.aws.com",grpc_status_code="OK",method_name="/csi.v1.Controller/CreateVolume",migrated="false",le="50"} 1
csi_sidecar_operations_seconds_bucket{driver_name="ebs.csi.aws.com",grpc_status_code="OK",method_name="/csi.v1.Controller/CreateVolume",migrated="false",le="120"} 1
csi_sidecar_operations_seconds_bucket{driver_name="ebs.csi.aws.com",grpc_status_code="OK",method_name="/csi.v1.Controller/CreateVolume",migrated="false",le="300"} 1
csi_sidecar_operations_seconds_bucket{driver_name="ebs.csi.aws.com",grpc_status_code="OK",method_name="/csi.v1.Controller/CreateVolume",migrated="false",le="600"} 1
csi_sidecar_operations_seconds_bucket{driver_name="ebs.csi.aws.com",grpc_status_code="OK",method_name="/csi.v1.Controller/CreateVolume",migrated="false",le="+Inf"} 1
csi_sidecar_operations_seconds_sum{driver_name="ebs.csi.aws.com",grpc_status_code="OK",method_name="/csi.v1.Controller/CreateVolume",migrated="false"} 3.303456169
csi_sidecar_operations_seconds_count{driver_name="ebs.csi.aws.com",grpc_status_code="OK",method_name="/csi.v1.Controller/CreateVolume",migrated="false"} 1
To enable this metric, --http-endpoint
needs to be defined for the external provisioner sidecar. Currently, you would be able to do via the additionalArgs
helm param: https://github.com/kubernetes-sigs/aws-ebs-csi-driver/blob/09f742f7a545ea4d1d5fef333715e25cd2064c0d/charts/aws-ebs-csi-driver/values.yaml#L25
Wow, that's fantastic! Thanks a lot @torredil! I'm currently managing the EBS CSI driver as an EKS add-on using Terraform. Could you please review the following configuration? Let me know if there are any corrections needed:
{
"sidecars": {
"provisioner": {
"additionalArgs": [
"--http-endpoint=0.0.0.0:8080"
]
}
}
}
Could you please review the following configuration? Let me know if there are any corrections needed
You got it mate, no corrections needed 👍
As a quick sanity check, you should be able to see the relevant metrics by going through this exercise:
$ export ebs_csi_controller=$(kubectl get pods -n kube-system -o custom-columns=NAME:.metadata.name | grep ebs-csi-controller | while read podname; do if kubectl logs $podname -n kube-system -c csi-provisioner | grep -q "successfully acquired lease"; then echo $podname; fi; done) && kubectl port-forward $ebs_csi_controller 8080:8080 -n kube-system
Forwarding from 127.0.0.1:8080 -> 8080 Forwarding from [::1]:8080 -> 8080 Handling connection for 8080 Handling connection for 8080
2. Grab logs:
$ curl 0.0.0.0:8080/metrics | grep "CreateVolume"
% Total % Received % Xferd Average Speed Time Time Time Current csi_sidecar_operations_seconds_bucket{driver_name="ebs.csi.aws.com",grpc_status_code="OK",method_name="/csi.v1.Controller/CreateVolume",migrated="false",le="0.1"} 0 csi_sidecar_operations_seconds_bucket{driver_name="ebs.csi.aws.com",grpc_status_code="OK",method_name="/csi.v1.Controller/CreateVolume",migrated="false",le="0.25"} 0 csi_sidecar_operations_seconds_bucket{driver_name="ebs.csi.aws.com",grpc_status_code="OK",method_name="/csi.v1.Controller/CreateVolume",migrated="false",le="0.5"} 0 csi_sidecar_operations_seconds_bucket{driver_name="ebs.csi.aws.com",grpc_status_code="OK",method_name="/csi.v1.Controller/CreateVolume",migrated="false",le="1"} 0 csi_sidecar_operations_seconds_bucket{driver_name="ebs.csi.aws.com",grpc_status_code="OK",method_name="/csi.v1.Controller/CreateVolume",migrated="false",le="2.5"} 0 csi_sidecar_operations_seconds_bucket{driver_name="ebs.csi.aws.com",grpc_status_code="OK",method_name="/csi.v1.Controller/CreateVolume",migrated="false",le="5"} 2 csi_sidecar_operations_seconds_bucket{driver_name="ebs.csi.aws.com",grpc_status_code="OK",method_name="/csi.v1.Controller/CreateVolume",migrated="false",le="10"} 2 csi_sidecar_operations_seconds_bucket{driver_name="ebs.csi.aws.com",grpc_status_code="OK",method_name="/csi.v1.Controller/CreateVolume",migrated="false",le="15"} 2 csi_sidecar_operations_seconds_bucket{driver_name="ebs.csi.aws.com",grpc_status_code="OK",method_name="/csi.v1.Controller/CreateVolume",migrated="false",le="25"} 2 csi_sidecar_operations_seconds_bucket{driver_name="ebs.csi.aws.com",grpc_status_code="OK",method_name="/csi.v1.Controller/CreateVolume",migrated="false",le="50"} 2 csi_sidecar_operations_seconds_bucket{driver_name="ebs.csi.aws.com",grpc_status_code="OK",method_name="/csi.v1.Controller/CreateVolume",migrated="false",le="120"} 2 csi_sidecar_operations_seconds_bucket{driver_name="ebs.csi.aws.com",grpc_status_code="OK",method_name="/csi.v1.Controller/CreateVolume",migrated="false",le="300"} 2 csi_sidecar_operations_seconds_bucket{driver_name="ebs.csi.aws.com",grpc_status_code="OK",method_name="/csi.v1.Controller/CreateVolume",migrated="false",le="600"} 2 csi_sidecar_operations_seconds_bucket{driver_name="ebs.csi.aws.com",grpc_status_code="OK",method_name="/csi.v1.Controller/CreateVolume",migrated="false",le="+Inf"} 2 csi_sidecar_operations_seconds_sum{driver_name="ebs.csi.aws.com",grpc_status_code="OK",method_name="/csi.v1.Controller/CreateVolume",migrated="false"} 8.772878436 csi_sidecar_operations_seconds_count{driver_name="ebs.csi.aws.com",grpc_status_code="OK",method_name="/csi.v1.Controller/CreateVolume",migrated="false"} 2
Hey @torredil
Just wanted to drop a quick note to say thanks for your awesome help with the issue.
Really appreciate your quick response and expertise. You rock! 🚀
Cheers,
I am currently exploring options for monitoring PVC creation times in the ebs-csi-driver, with the goal of setting up alerts if the process exceeds a certain duration, such as 5 minutes.
Upon reviewing the available metrics, I noticed the existence of the cloudprovider_aws_api_request_duration_seconds_bucket metric, and I'm wondering if this metric can be utilized to measure the time taken for PVC creation. Could you please provide more details on this metric and clarify if it can be used for tracking PVC creation times?
Additionally, I'm open to exploring alternative metrics or approaches that you may recommend for effectively monitoring PVC creation times or any other useful metrics for follow health of ebs-csi-driver functionalities and health of infrastructure in terms of managing pvs