[Feature] [API Server] [RFC] Add persistence for job history using a SQL database

Search before asking

[X] I had searched in the issues and found no similar feature requirement.

Description

Hello KubeRay community, thanks for developing the API Server component! I'm new here, and I want to collect some thoughts about implementing persistent storage for API Server. According to the API Server design doc,

we want to leave some flexibility to use database to store history data in the near future (for example, pagination, list options etc)

Right now, past Ray jobs are stored as CRDs in the Kubernetes etcd database, and the API Server queries the CRDs directly. This doesn't seem to be as scalable as a solution backed by a SQL database. I can think of two ways to implement this:

Use a "persistent agent" to watch for changes in the CRDs and sync them with a database. Delete the CRDs when they reach a terminal state. Clients can directly list the jobs from the database instead of querying the CRDs. This is what Kubeflow pipeline does (architecture diagram, codebase).
Keep using the CRDs when the job is running, but as soon as the job finishes, we snapshot it and store it in the database before cleaning up the CRD.

Do people think that having a database is important? If so, do we have a plan to implement this?

Use case

Support keeping track of job history without leaving a lot of CRDs in Kubernetes.

Are you willing to submit a PR?

[X] Yes I am willing to submit a PR!

ray-project / kuberay