google / ml-metadata

For recording and retrieving metadata associated with ML developer and data scientist workflows.
https://www.tensorflow.org/tfx/guide/mlmd
Apache License 2.0
616 stars 145 forks source link

fix postgresql string property index #195

Open tarilabs opened 7 months ago

tarilabs commented 7 months ago

See #194

Proposed solution(s)

Taking MLMD's Context string property as example,

S1. modify PostgreSQL index to be ~like:

CREATE INDEX idx_context_property_string_gist ON public.contextproperty USING gist (name gist_trgm_ops, string_value gist_trgm_ops);

or derivatives. (requires first CREATE EXTENSION pg_trgm;)

S2. modify PostgreSQL index to be ~like:

CREATE INDEX idx_context_property_string ON public.contextproperty USING btree (name, is_custom_property, "substring"(string_value, 1, 255));

making it similar to MySQL behaviour, less ideal because would only be partially helpful on queries longer term.

(S3. a third solution would be to drop the idx_..._property_string indexes altogether which seems to be causing the issue, but I don't think is worth considering).

This PR

The scope of this PR is to demonstrate S1 in practice, or discuss alternative approaches.

/cc @XinranTang could you kindly consult with the team and feedback on this, please?

tarilabs commented 3 days ago

@XinranTang could you kindly take a look, please?

tarilabs commented 3 days ago

@chensun was this possibly been looked into during last summer of code, also ref https://github.com/kubeflow/pipelines/issues/7512 ; thank you in advance for your feedback