kubeflow / metadata

Repository for assets related to Metadata.
Apache License 2.0
121 stars 67 forks source link

On-prem data and models #77

Open ryandawsonuk opened 5 years ago

ryandawsonuk commented 5 years ago

Let’s say my model or data is in a volume. This might be the case if I’m running on-prem and perhaps my data isn’t allowed to leave my company network. If so does ‘uri’ allow me to capture where it is? I then want a name of a volume and path on the volume. Perhaps those could be introduced as fields or the existing field called ‘location’?

@cvenets perhaps you have a view on this?

jinchihe commented 5 years ago

For on-prem volume case, Is that possiable to set the uri to pvc://pvc_name/path/for/models? First area between / can be pvc_name and following model path.

WDYT @zhenghuiwang ?

zhenghuiwang commented 5 years ago

Yes, you can set the uri with prefix pvc://

There is no schema for the uri. It is up to the users & integrated systems to interpret it. Some examples https://github.com/kubeflow/metadata/blob/master/schema/alpha/docs/artifacts/model.md#uri-examples

cvenets commented 5 years ago

As @zhenghuiwang mentioned the metadata service doesn't interpret the URIs, it just stores them. The URI can be completely arbitrary, so yes, you can prefix it with pvc://. The other systems that interact with the metadata service need to know how to interpret it and consume it.

We have started a discussion with @zhenghuiwang in the design doc, for extending the metadata service's context execution, too. This will allow us to track PVCs and even K8s objects in an immutable way, so if you are running an intelligent data management layer underneath (like Arrikto's Rok), you will be able to reproduce the whole state from a URI in the metadata service. I will be coming back on this for v0.7, once v0.6 is released and we can test the Metadata functionality.

jtfogarty commented 4 years ago

/kind question /area engprod /priority p2