Add support for multiple buckets and detecting buckets

maikelpoot commented 3 years ago

This PR will add support for searching in multiple or all your buckets in one go.

As i don't want to update my prometheus config everytime a buckets get created i want to use the exporter without predefined buckets.

Also added the option to give multiple buckets in one request as csv ?bucket=logs,playground. The prefix will then be used on each of these buckets.

On success it will return metrics for every bucket, but if a (detected) bucket cant be accessed it will only return the s3_list_success for that bucket with value 0:

s3_list_success{bucket="playground",prefix=""} 0

Expected output:

# HELP s3_biggest_object_size_bytes The size of the biggest object
# TYPE s3_biggest_object_size_bytes gauge
s3_biggest_object_size_bytes{bucket="logs",prefix=""} 0
s3_biggest_object_size_bytes{bucket="playground",prefix=""} 16
# HELP s3_last_modified_object_date The last modified date of the object that was modified most recently
# TYPE s3_last_modified_object_date gauge
s3_last_modified_object_date{bucket="logs",prefix=""} -6.795364578e+09
s3_last_modified_object_date{bucket="playground",prefix=""} 1.602685495e+09
# HELP s3_last_modified_object_size_bytes The size of the object that was modified most recently
# TYPE s3_last_modified_object_size_bytes gauge
s3_last_modified_object_size_bytes{bucket="logs",prefix=""} 0
s3_last_modified_object_size_bytes{bucket="playground",prefix=""} 16
# HELP s3_list_success If the ListObjects operation was a success
# TYPE s3_list_success gauge
s3_list_success{bucket="logs",prefix=""} 1
s3_list_success{bucket="playground",prefix=""} 1
# HELP s3_objects The total number of objects for the bucket/prefix combination
# TYPE s3_objects gauge
s3_objects{bucket="logs",prefix=""} 0
s3_objects{bucket="playground",prefix=""} 5
# HELP s3_objects_size_sum_bytes The total size of all objects summed
# TYPE s3_objects_size_sum_bytes gauge
s3_objects_size_sum_bytes{bucket="logs",prefix=""} 0
s3_objects_size_sum_bytes{bucket="playground",prefix=""} 80

maikelpoot commented 3 years ago

ps. havent updated the tests yet to include a multiple bucket fetch, first wanted to know if this feature is wanted, or if i should stick with my fork

ribbybibby commented 3 years ago

Hi @maikelpoot,

Thanks for the PR!

Also added the option to give multiple buckets in one request as csv ?bucket=logs,playground.

I don't think this is a particularly useful addition as you can already simply define separate targets for mutiple buckets:

    static_configs:
      - targets:
        - bucket=logs;prefix=some-prefix;
        - bucket=playground;prefix=some-prefix;

Adding an extra bucket name to a target vs adding another target doesn't seem any more convenient to me.

As i don't want to update my prometheus config everytime a buckets get created i want to use the exporter without predefined buckets.

I can see why this is appealing but I have some reservations:

With a lot of large buckets, the time taken for the probe to complete could become really long. It would be better for these calls to be split up into separate probes.
I don't think the list of all buckets should be triggered by omitting the bucket parameter, otherwise calling /probe without any arguments will list all buckets, which could be done by accident and could be quite an expensive operation

Overall, I'm not sure this belongs in the exporter. Rather than adding this functionality to the exporter, you could write a simple sidecar process which generates the targets and then load the output dynamically with file_sd_config:

while true; do 
  echo -e "- targets:\n$(aws s3 ls | awk '{print "  - bucket="$3";"}')" > /s3-targets.yaml
  sleep 300
done

Would give you /s3-targets.yaml:

- targets:
  - bucket=logs;
  - bucket=playground;

Which you could scrape thusly:

scrape_configs:
  - job_name: 's3'
    metrics_path: /probe
    file_sd_configs:
      files:
        - /s3-targets.yaml
      refresh_interval: 5m
    relabel_configs:
      - source_labels: [__address__]
        regex: '^bucket=(.*);$'
        replacement: '${1}'
        target_label: '__param_bucket'
      - target_label: __address__
        replacement: 127.0.0.1:9340

maikelpoot commented 3 years ago

hi @ribbybibby

Adding an extra bucket name to a target vs adding another target doesn't seem any more convenient to me.

Yeah i know, but is was bonus due to the choice how to go from a string to a list: buckets = []string{ bucket } or buckets = strings.Split(bucket, ",")

I don't think the list of all buckets should be triggered by omitting the bucket parameter, otherwise calling /probe without any arguments will list all buckets, which could be done by accident and could be quite an expensive operation

Good point, could update to use a placeholder to trigger the listing. Maybe ?bucket=[all]? Would not use *, could give the impression that wildcards can be used

With a lot of large buckets, the time taken for the probe to complete could become really long. It would be better for these calls to be split up into separate probes.

Yes i'm aware, but isn't that up to the user?. When you run into that problem you can switch to seperate probes.

Overall, I'm not sure this belongs in the exporter.

I believe is does. For example the node-exporter doesn't ask which drives, cpu, etc to give metrics for. It just finds and gives metrics for all available. An s3_exporter should have the same option and give metrics for all buckets available for the account used.

But in the end, it's your project :-)

ribbybibby commented 3 years ago

I believe is does. For example the node-exporter doesn't ask which drives, cpu, etc to give metrics for. It just finds and gives metrics for all available. An s3_exporter should have the same option and give metrics for all buckets available for the account used.

That's not a fair comparison. The node_exporter is distinctly different to this exporter in a number of ways.

This isn't an exporter for s3 as a service or s3 as the subset of buckets that a given account has access to. It's very specifically scoped and designed to provide metrics for individual bucket queries, in the same way the blackbox_exporter provides metrics for individual endpoints.

Your use case is valid but seen as there are ways of addressing it without making changes to how the exporter works fundamentally, I don't think the addition of this functionality justifies the added complexity.

What I might be interested in would be a subcommand to the exporter that could discover buckets and update the sd file. I may actually look into implementing that.

maikelpoot commented 3 years ago

Yeah, maybe i didn't think this through enough :) This project was the closest to what i needed, and only needed that auto-detection. But i can see the feature could bite me in the future. What if one bucket grows to large, then it would require a "All but ..." option. Can of worms :D

What I might be interested in would be a subcommand to the exporter that could discover buckets and update the sd file. I may actually look into implementing that.

Sound interesting , first thing that comes to my mind is an endpoint: It would be easy to implement a /detect endpoint (or something) that

returns a list of buckets in various formats (json, text, yaml, targetlist). Downside is you still will need something to update the sd file, but gives flexibility for people running the exporter remote from prometheus.
[optional] writes the targetlist to the prometheus file_sd path (parameter). Prometheus will reload this file automatically. We can even make the endpoint scrape-able we can add it to prometheus (with a large interval like an hour or more) so we don't need cronjob or something to stay updated :D

Why also an endpoint? it used the authentication/config parameters the exporter is started with and can be accessed remotely.

I could give this a go if you like.

ribbybibby commented 3 years ago

Going to close this PR as I believe it'll be addressed with the new HTTP service discovery. See https://github.com/ribbybibby/s3_exporter/pull/27.

ribbybibby / s3_exporter

Add support for multiple buckets and detecting buckets #15