Open moserke opened 6 years ago
Yes. We use the resourceId's on the ec2 cost rows for correlation. The specific part where we use it is here: https://github.com/operator-framework/operator-metering/blob/master/charts/reporting-operator/templates/custom-resources/report-queries/aws-billing.yaml#L34-L40.
Thanks! I thought that might be the case. This makes for really large cost reports. Is there anything to keep in mind in terms of metering performance with large reports?
We already partition the table containing the cost report by month, so that generally helps. Generally it's probably best to keep the reportingStart and reportingEnd limited to about a month. You may want to increase the memory of Presto, and it may help to run some dedicated worker replicas (which we don't have documented currently).
We're working on making it easier to aggregate across existing reports too, which will make doing roll-ups from many, smaller reports easier.
Lastly, we're still working on getting this deployed to one of our larger environments (thousands of namespaces, 100+ nodes) but, that's one of our top priorities and we'll be looking to document anything related to scaling we get from that process.
Are workers something that can be configured in the Metering object? Or is this something that will have to be managed ourselves?
Is there a "tuning" documentation somewhere? I think that would be really helpful for this project. Great stuff though, really easy to get started!
It's something you can configure on the metering object, (this is where it's undocumented). And yes, the literal goal of us trying to deploy to a larger environment is to write the tuning document your describing. It's difficult, because we expose quite a few knobs, but we don't necessarily want users to be using all of them since we ideally automate the need to tune these things. However, there is a gap right now, so it may be useful to document the knobs we expect are most likely for someone to want.
Here's a snippet of a custom configuration that I use. I expect that not everything here is necessary for you, but it should give you some tunables that you can mess with. https://gist.github.com/chancez/db4e2e4e5f7bcb20e195b439e0f5acf1
The key parts are anything with the replicas
field set is one you may wish to adjust. Setting resources
is very common, this is documented already. taskMaxWorkerThreads
is also useful, and translates to task.max-worker-threads
in https://prestodb.io/docs/current/admin/properties.html. We're actively working on this aspect of our documentation, so we'll keep this open and update you when we get more documentation related to this.
Awesome, thanks so much for this! I did find the worker info by poking through the helm charts.
On a side note, I was not able to use 0.8.0-latest because it seems to be hitting https://issues.apache.org/jira/browse/HADOOP-13811. We can not auth to AWS because of a mismatch of lib versions. I was able to go back to 0.7.0 and get things working. (I can open a new issue on this if that's better)
Can you open another issue with the pod logs of the components your seeing the errors in? Also, if you can provide your configuration with the credentials set, but replaced with fake values, that's useful too, just so I can verify everything is set correctly, in all the right spots.
https://github.com/operator-framework/operator-metering/pull/442 Should fix the auth issue you're experiencing. It was related to a recent refactor that accidently removed some environment variables from a few pods.
Confirmed, things are working on the 0.8.0-latest now! Thanks!
The AWS Cost & Usage Report, do the Resource IDs need to be included? I can not find a definitive answer on this. My gut says yes because this is how correlation will be done, but would like to confirm.