trustification / trustify

Apache License 2.0
8 stars 15 forks source link

docs: proposal for tags #437

Closed ctron closed 6 days ago

JimFuller-RedHat commented 2 weeks ago

Quick set of observation: tagging was implemented in both osidb and corgi ... in the end we learned a few things:

personally I like tags as a hedge against the unknown but just wait until someone asks 'I want to add my own personal tags' to items ... (ex. if uri then http://prodsec.redhat.com/users/jfuller#mycooltag)

ctron commented 2 weeks ago

So I just tried out using jsonb instead of string[] for the "label" alternative.

I converted the tags into labels, assigning a random value. Searching for all entries having a set of keys/tags changes from ~95ms to ~140ms on my machine. Not too bad, bit still worse than tags.

The index size also grew from ~240MB to ~280MB. Again, not terrible.

The query actually doesn't look too bad either:

SELECT Count(*)
FROM cats_map_text
WHERE cat_labels ?& ARRAY ['red', 'brown', 'aloof'];

Searching by key + value looks quite good:

SELECT Count(*)
FROM cats_map_text
WHERE cat_labels @> '{"red": 100, "brown": 100}'::jsonb;

And performs quite well: ~40ms

So I guess both (tags and labels) can be a reasonable approach.

JimFuller-RedHat commented 2 weeks ago

FWIW hstore (instead of json column) and gin indexes is probably faster but not sure its so compelling to consider for trustify

ctron commented 2 weeks ago

Replacing a tag seems like an interesting operation. I am not sure we should (today) spend too many cycles on ACL in the context of tags/labels. I wouldn't use labels and tags for ACL (today). And so everyone can upload files and tag/label them. I am 100% sure this model is way too simple, but I also think we need a more global approach to ACL anyway.

I am not sure I understand the URI idea though. Maybe you have a pointer to read up on this? Or an example.

ctron commented 2 weeks ago

Indeed, I read about hstore too. But the summary was: jsonb might be a bit overkill, but is easier to work with. I'll go ahead and try out hstore too in my scratchpad. Just to see if there are silver bullets and magic fairy dust we might overlook.

ctron commented 2 weeks ago

Ok, so hstore uses a little less storage: ~250MB compared to ~280MB for jsonb.

The query performance for "has key" is faster: ~90ms vs 140ms (jsonb). The query performance for "has key with value" is the same.

One downside of hstore is (aside from it not being json) that it comes in via a (trusted) extension.

JimFuller-RedHat commented 2 weeks ago

The problem with json is 'database in a database' idiom eg. it becomes easy to stuff in a rich data structure with no schema ... given 'some period of time' things can get unruly ... hstore constrains the value ... though pg array is probably just as fast .. though both those approaches, given any kind of volume, need gin indexes (which are 'special' in their own way) - json is sure very useful but if not needed then I tend to opt for simplest abstraction. Dont let my cryptic answers put you off using em ... though at scale these differences can make themselves 'felt'.

ctron commented 2 weeks ago

Good points. And I guess (if needed) migrating from hstore to jsonb is trivial. While the other way is not.

ctron commented 2 weeks ago

So I guess the idea now is to go for hstore?! I guess we need to figure out psql extension at some point anyway.

jcrossley3 commented 2 weeks ago

I like this. I prefer tags over labels from a UI perspective, but could get behind whichever is easier to implement.

ctron commented 2 weeks ago

I updated the PR in favor of labels using hstore. While writing this up, I found that SeaORM and SQLX seem to lack support for hstore though.

I would ask to take another look at the proposal. It should basically swap the proposal from tags to labels. And adds a bit on hstore and jsonb.

I will take a look at what it means to add hstore support for both sqlx and seaorm. And if that's a quick change, then we could consider this as an option.

ctron commented 1 week ago

I did an update on hstore vs jsonb. TL;DR: hstore is some bigger effort, jsonb is good enough but ready to use.

So I'll let the PR open for a bit, but start working towards jsonb.

ctron commented 6 days ago

This went into one of the PR implementing labels.