trinodb / trino

Official repository of Trino, the distributed SQL query engine for big data, formerly known as PrestoSQL (https://trino.io)
https://trino.io
Apache License 2.0
10.22k stars 2.95k forks source link

Build a Kubernetes Operator for Presto #396

Open 11xor6 opened 5 years ago

11xor6 commented 5 years ago

This issue is to lay the groundwork for a Presto operator on Kubernetes. The goal of this issue is to both lay down the basic requirements for the operator and to provide a forum for collecting feature requests to be built into the operator.

At its core the operator should:

Additional features:

Given this basic outline I'm interested in hearing requirements, feature requests, and comments from the community.

oneonestar commented 5 years ago

Just write down some thoughts. Ranked in order of interest:

11xor6 commented 5 years ago

@oneonestar Thanks for the feedback, these are good suggestions. Here are my comments on them:

GrigorievNick commented 5 years ago

Just write down some thoughts. Ranked in order of interest:

  • Allow user to provide customized docker image (which includes Presto plugin / UDF / in-house patched Presto, etc)
  • Allow user to add configuration files for their plugins (eg. config file for event-listener...)
  • Store confidential config / key in a secure way (kerberos keytab file, database connection password)
  • 0 downtime rolling update of Presto
  • (Optional) synchronize the resource config for pods and the config for Presto (eg. spec.containers[].resources.requests.memory and jvm -Xmx)

I can add here, allowed add jars with custom plugins.

Jeffwan commented 4 years ago
  • Deploy a Coordinator and configurable number of Workers
  • Configure the cluster and catalogs from the CRD

I am new to Presto and recently looking at solution on K8s. Does presto natively supports K8s? Instead of statically creating workers containers and run sqls, is it possible to use kubernetes resources in a serverless manner? That means user can determine the resources to run the SQL, then some component creates workers dynamically at the query level.

GrigorievNick commented 4 years ago

@Jeffwan I use this https://docs.starburstdata.com/latest/kubernetes.html.

Jeffwan commented 4 years ago

@Jeffwan I use this https://docs.starburstdata.com/latest/kubernetes.html.

@GrigorievNick

This one seems use HPA for workers and I think it should be fine. In a multi-tenant cluster, I assume everyone is able to create PrestoCluster custom resource in their own namespaces. HPA will dynamically adjust number of the workers, new works will join via coordinator.

Seems starburstdata provides is an enterprise server, trying to understand the scope of this ticket. Does community want an open source version operator?

electrum commented 4 years ago

Presto is designed as a long running process that can share resources across many concurrent queries. It’s inherently multi-tenant and is used in environments with hundreds or thousands of concurrent users.

Auto scaling solutions for Presto work by scaling the workers up or down based on cluster load or other factors.

You might find the Presto paper helpful: https://prestosql.io/paper

caneGuy commented 4 years ago

@Jeffwan I use this https://docs.starburstdata.com/latest/kubernetes.html.

@GrigorievNick

This one seems use HPA for workers and I think it should be fine. In a multi-tenant cluster, I assume everyone is able to create PrestoCluster custom resource in their own namespaces. HPA will dynamically adjust number of the workers, new works will join via coordinator.

Seems starburstdata provides is an enterprise server, trying to understand the scope of this ticket. Does community want an open source version operator?

really support the community to write an open source version operator

ringtail commented 4 years ago

@Jeffwan I use this https://docs.starburstdata.com/latest/kubernetes.html.

@GrigorievNick

This one seems use HPA for workers and I think it should be fine. In a multi-tenant cluster, I assume everyone is able to create PrestoCluster custom resource in their own namespaces. HPA will dynamically adjust number of the workers, new works will join via coordinator.

Seems starburstdata provides is an enterprise server, trying to understand the scope of this ticket. Does community want an open source version operator?

@Jeffwan We are designing presto operator for kubernetes native. And the key factor is autoscaling. We can bring the design as a proposal to be reviewed.

casperit commented 4 years ago

@ringtail look forward to the presot operator for prestosql

ringtail commented 4 years ago

@ringtail look forward to the presot operator for prestosql

Sure. But in Alibaba Cloud. The code base is on PrestoDB due to some historical reasons. And We would like to keep the design more generic. So prestosql would be ok as well I suppose.

XuQianJin-Stars commented 4 years ago

hi @11xor6 @dain What is the current progress?

hbhanawat commented 4 years ago

Falarica Analytics has open sourced its Kubernetes Operator for Presto yesterday. The operator is production ready with features like autoscaling, https support, secrets for catalogs, self healing, support for property files (authentication, resource groups etc). Operator works for both PrestoDB and PrestoSQL.

https://github.com/falarica/steerd-presto-operator

We look forward to work with the community to make this more suitable for adhoc analytics use cases.

ringtail commented 4 years ago

Falarica Analytics has open sourced its Kubernetes Operator for Presto yesterday. The operator is production ready with features like autoscaling, https support, secrets for catalogs, self healing, support for property files (authentication, resource groups etc). Operator works for both PrestoDB and PrestoSQL.

https://github.com/falarica/steerd-presto-operator

We look forward to work with the community to make this more suitable for adhoc analytics use cases.

Looks good! We also have one implement on prestodb but with different design ideas. Is there a sig meeting online? We would like to share the architecture design and demo together.

hbhanawat commented 4 years ago

@ringtail is your operator open sourced? If yes, please share the github URL.
No idea about the sig meeting. Would like to know if there is one.

ringtail commented 4 years ago

@ringtail is your operator open sourced? If yes, please share the github URL. No idea about the sig meeting. Would like to know if there is one.

Sure. We are doing the open-source process. And I will submit the design doc to this issue later.

ringtail commented 4 years ago

@ringtail is your operator open sourced? If yes, please share the github URL. No idea about the sig meeting. Would like to know if there is one.

Could I find you in slack. We can discuss later.

ashishmgofficial commented 3 years ago

Any further developments on this ? I was looking for presto deployment on Kubernetes

findepi commented 3 years ago

@ashishmgofficial please take a look at https://github.com/trinodb/charts