Open jlewi opened 6 years ago
@jlewi We have documented the feature set in KVC here: https://github.com/kubeflow/experimental-kvc/blob/master/docs/arch.md#the-kvc-controller. Do you think that is sufficient?
We have docs from the point of view of a user, developer and operator in https://github.com/kubeflow/experimental-kvc/blob/master/docs/user.md and https://github.com/kubeflow/experimental-kvc/blob/master/docs/dev.md and https://github.com/kubeflow/experimental-kvc/blob/master/docs/operator.md, respectively.
Re: the question on on-prem deployments, it can be used in any deployment.
Also could be interesting to understand how this will interact with data versioning that could be included with https://github.com/kubeflow/kubeflow/issues/151 integration
/cc @dwhitena
@bhack Yes, I would love to discuss that integration with you. Just to give you an idea, https://github.com/kubeflow/kubeflow/issues/151 is generally providing the following:
a data management layer based on any generic object store (S3, GCS, Azure blob storage, Minio, Rook, etc.) in which data sets processed via TFJobs (or any other processing stages) would be organized and versioned.
A way to launch and interact with KubeFlow resources (e.g., TFJobs) from within a DAG pipeline.
A way to track which data was processed by what to produce which results (i.e., "data provenance").
Pachyderm does utilize a PV that backed metadata from etcd. So maybe that is a place where there is some interaction? Also, users that are deploying custom object stores (Minio, Rook, etc.) to back Pachyderm might have some needs around PV management.
I think the README should explain why and when a user would want to use the KVC; i.e. what problems it solves.
My understanding from past discussions is that the KVC is caching data on nodes. Is this intended primarily for on prem deployments?