Open yiyuanyu17 opened 2 years ago
Hello @yiyuanyu17 , can you help us understand what is the reason of not being able to use mysql? And do you want to use postgresql within cluster or outside the cluster? Or are you looking for a way to configure postgresql as an alternative of cloudsql?
Hello @yiyuanyu17 , can you help us understand what is the reason of not being able to use mysql? And do you want to use postgresql within cluster or outside the cluster? Or are you looking for a way to configure postgresql as an alternative of cloudsql?
hello, we use kubeflow pipeline for AI model training in our platform. In the process of privatization delivery, some customers explicitly require that self built MySQL is not allowed, and the PostgreSQL provided by the customer side must be used. Therefore, our applications are modified into ORM framework to adapt to different database types. However, it is noted that the kubeflow pipeline server has not increased its support for PostgreSQL. Therefore, it proposes this issue and hopes to get the help of the community.
Thank you for the info, @yiyuanyu17 . I will keep this issue open so people can upvote if they are also interested in this postgresql support. People can create overlay which connects to postgresql but such support is not available in this repo yet.
We would also be interested in this feature. We do a lot of on prem and disconnected/airgapped deployments. As such, Cloud Vendor hosted databases are not an option. In most scenarios it is easiest to run our own database clusters colocated on the same k8s environment as we run Kubeflow. The Crunchy Postgres experience on k8s is the best experience we've found to operate RDBMS clusters on k8s and we leverage it for other tooling. Would be nice to leverage it from Kubeflow as well, as operating MySql clusters on k8s is not as seamless an experience.
In our case we only have Postgres as an option for managed on prem DB. So looking for out fo the box Postgres support. @zijianjoy Can you please elaborate what creating overlay means, If that helps connect Kubeflow to postgres, I am interested to give it a try. Thanks!
Hi @zijianjoy,
We also strictly use PostgreSQL internally, since it's better suited for data warehousing purposes.
overlay
is a kustomize concept as described in https://kubernetes.io/docs/tasks/manage-kubernetes-objects/kustomization/. An overlay is a kubernetes resource package, it is like a variant of base KFP package.
Here is a list of KFP overlay: https://github.com/kubeflow/pipelines/tree/master/manifests/kustomize/env
If you look at platform-agnostic
folder, you will find that it is depending on mysql: https://github.com/kubeflow/pipelines/tree/master/manifests/kustomize/third-party/mysql
So if you want to introduce postgresql, what you need to do is:
I would recommend testing this postgresql integration on your environment first before committing to KFP repo, because there is no guarantee/testing to verify KFP working with postgresql.
It would be great if kubeflow pipeline support postgres!!! For some reason, our company also can not use MySQL. We strongly recommend the community to make the database optional @zijianjoy
It's a time consuming job for us every user to implement postgresql available for pipelines. So we're eagerly waiting for someone to contribute to it.
There are already pull requests implementing postgres for kubeflow katib (https://github.com/kubeflow/katib/pull/1921), I wander if there any plan about KFP SUPPORT PG?
MySQL is of no doubt an excellent database, however Oracle's acquisition brought uncertainty to its future. Like the others above, I sincerely hope kubeflow/pipeline can support postgresql soon, which is license friendly, and owns lots of advanced features.
Also note that google/mlmd doesn't support Postgres yet: https://github.com/google/ml-metadata/issues/26
As others have hinted towards here, PostgreSQL, especially with Operator Lifecycle Manager and, if wanted, being a Red-Hat-certified operator, is the way to go in an Enterprise environment that is Kubernetes-based. I wholeheartedly agree with all people who posted here. Database should not come pre-packaged with Kubeflow, as it is not a core component. Let people who really know their stuff handle things like database-ops and deployment, like e.g. Crunchy with PostgreSQL. And then use Postgres as a database for Kubeflow. Seriously, replication factor of 1, no pgbouncer proxy to improve load handling, no backup strategy .... https://github.com/kubeflow/katib/blob/9fce9dd03bc476b4e1f3d385e9692ac5cef681f4/manifests/v1beta1/components/postgres/postgres.yaml That cannot seriously be an approach by a project that has its origins with one of the big tech firms. Same goes for air-gapped functionality support with custom docker registries, HTTP_PROXY support via env variables and custom CA configmap for PKI trust.
Currently we would like help from community to support PostgresQL integration. For anyone who wants to contribute making Kubeflow Pipelines runnable with PostgresSQL:
Since we have #9813 to track this work, I'll close this issue. Please follow updates in that tracker issue
/close
@rimolive: Closing this issue.
@rimolive Sorry, since this work is not finished yet, the feature request bug is still valid. (Note: We use the upvote count of the original issue in order to track community's interest across the org, thus I am reopening this issue)
May I suggest keeping track of MLMD's: https://github.com/google/ml-metadata/issues/194#issuecomment-1975207465 for this KFP-with-PostgreSQL scope?
Reason being, when MLMD is backed by PostgreSQL, there is allegedly a practical limits of only ~2K chars in MLMD string properties.
Potential solutions are mentioned (and one presented) with: https://github.com/google/ml-metadata/pull/195
hope this helps!
Thanks @tarilabs for letting us know!
@zijianjoy Can you add this issue as a work item for MLMD integration in #9813? I thinks it's a good first issue and for GSoC.
Thanks @tarilabs for letting us know!
@zijianjoy Can you add this issue as a work item for MLMD integration in #9813? I thinks it's a good first issue and for GSoC.
Added, however, please note that it is going to be an optional task in terms of postgresql integration with KFP
, but a good item to contribute on.
Agreed, the idea to add this issue is for tracking purposes.
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
This issue has been automatically closed because it has not had recent activity. Please comment "/reopen" to reopen it.
/reopen /lifecycle frozen
@rimolive: Reopened this issue.
Feature Area
What feature would you like to see?
kubeflow pipeline add support of postgresql
What is the use case or pain point?
for some case , we can not use mysql for kubeflow pipeline , hope kubeflow pipeline can add the suppoort of postgresql
Love this idea? Give it a 👍. We prioritize fulfilling features with the most 👍.