noi-techpark / infrastructure-v2

Opendatahub Infrastructure v2 Repository
0 stars 0 forks source link

Bring down networking-related costs #55

Open clezag opened 5 months ago

clezag commented 5 months ago

Browsing the AWS cost center I've noted that we spent a unreasonable chunk of change on "EC2-other" charges, which on closer inspection revealed themselves to be almost exclusively networking related things:

image

Note that this is almost half our current AWS bill and we aren't really doing anything heavy yet.

Looks like it likely was a temporary increase in usage from this image: image

@christian-roggia @Luscha do you have an Idea how to mitigate this? Could this be caused by repeated image pulling during error states (ImagePull policy always)? Maybe the filebeat/elasticsearch logs during busy error logging? Or do we have to rethink our networking setup?

christian-roggia commented 5 months ago

Almost always when we are looking at spikes in network-related costs one of the following is the root cause:

Image pulling should be fine, especially considering that Kubernetes has internally an exponential back-off in case of failure, and will reuse locally downloaded images unless Always is used as the pull image policy instead of IfNotPresent.

I would however verify that the current setup always tries to keep communication within the VPC rather than going out to the public internet just to go back into the AWS network immediately after.

While investigating please take into account the following:

https://aws.amazon.com/ec2/pricing/on-demand/

This incident tells me it's probably a good time to set up proper monitoring and alerts. Root cause analysis is significantly easier when you have proper observability of the system and metrics you can work with.