ribbybibby / s3_exporter

Exports Prometheus metrics about S3 buckets and objects
Apache License 2.0
104 stars 42 forks source link

Add ability to evaluate versioned buckets #51

Open enticedwanderer opened 11 months ago

enticedwanderer commented 11 months ago

Summary:

This PR adds the ability for users to list and evaluate all versioned objects in the bucket. The existing behavior was using ListObjectsV2 API which only listed current/latest objects on a versioned bucket. As a result all the metrics did not account for older versions of objects which were present on the bucket and taking up space. For my use case this was unacceptable, as I need to know when I'm approaching the storage limit of my bucket.

In order to address this a few changes were introduced:

  1. Provide a new flag --s3.with-versions which changes the behavior of the API calls and is purely opt in (default is false) which maintains backwards compatibility.
  2. Abstract away the counters into a new ItemAggregator struct which keeps track of statistics and define a separate parallel method on how to evaluate all objects in a bucket using ListObjectVersions API.
  3. Select between 2 different implementations (CountViaListObjectsV2 and CountViaListObjectVersions) the appropriate method based on the flag.
  4. Extend test case semantics to support the new API usage and write additional unit tests to exercise them.

Further things of note:

  1. Right now, the flag is global and not per bucket. This is ok, because ListObjectVersions is backwards compatible with non-versioned buckets and will function just as well with any bucket. Otherwise, we would also have to query the bucket versioning status.
  2. In integ testing, on my S3 Provider (E2 idrive) the ListObjectVersions API performs about 10-15% slower for the same non-versioned bucket. Obviously for a versioned bucket the difference can be much much bigger. This matters if the evaluation time is close to the job timeout in Prometheus (i.e. you have a huge bucket with a lot of objects).

Tested on my account end to end across 3 different buckets (versioned, non-versioned, empty non-versioned).

Happy to consider any changes you want me to make or any other suggestions you might have.