kube-reporting / metering-operator

The Metering Operator is responsible for collecting metrics and other information about what's happening in a Kubernetes cluster, and providing a way to create reports on the collected data.
Apache License 2.0
339 stars 86 forks source link

presto other database support #831

Open kfox1111 opened 5 years ago

kfox1111 commented 5 years ago

Would it be possible to target presto to mysql/postgresql directly and support disabling deploying hive? It may be much simpler to use for those that don't have large clusters.

chancez commented 5 years ago

Supporting other databases is in the backlog, and is something I've wanted for a while but is just lower priority currently as we're working on a GA release. Adding support for other databases is relatively simple, but removing Hive is trickier.

Removing Hive is somewhat difficult because we use the map datatype in Presto/Hive for Prometheus metric data, which is basically only supported by Hive for Presto right now. We would need a way to configure prometheusMetricImporterDataSource's with a flat column structure without the maps, mapping labels from the results into columns via configuration in the datasource. Then we would need to update some of the ReportQueries to handle the new data model, but that would actually be fairly easy since we model the tables like this already, using views. After this, we could then use Mysql/Postgresql for storing metric data.

We're also exploring options like a native Prometheus connector for Presto, which would potentially allow us to stop importing metrics altogether, which could make this a lot easier. In this case, we would have dataSources which simply map directly to Prometheus time series tables in presto, and we would be able to write our reportQueries against those tables, and then reports could store data into mysql/postgresql.

mmariani commented 5 years ago

If you plan to fully support Postgres for storing metrics (jsonb could be an alternative to the map type), consider the Timescale extension which has magic table partitioning under the hood. It has some limitations (no surrogate primary keys etc) but otherwise it's 98% compatible with much better performance.

kfox1111 commented 5 years ago

Interesting. Thanks for the info.

chancez commented 5 years ago

@mmariani jsonb isn't supported by Presto, and leveraging timescaledb wouldn't help any since Presto would likely be unable to pushdown much of the filtering today.