Closed ctron closed 6 days ago
So I just tried out using jsonb
instead of string[]
for the "label" alternative.
I converted the tags
into labels, assigning a random value. Searching for all entries having a set of keys/tags changes from ~95ms to ~140ms on my machine. Not too bad, bit still worse than tags.
The index size also grew from ~240MB to ~280MB. Again, not terrible.
The query actually doesn't look too bad either:
SELECT Count(*)
FROM cats_map_text
WHERE cat_labels ?& ARRAY ['red', 'brown', 'aloof'];
Searching by key + value looks quite good:
SELECT Count(*)
FROM cats_map_text
WHERE cat_labels @> '{"red": 100, "brown": 100}'::jsonb;
And performs quite well: ~40ms
So I guess both (tags and labels) can be a reasonable approach.
FWIW hstore (instead of json column) and gin indexes is probably faster but not sure its so compelling to consider for trustify
Replacing a tag seems like an interesting operation. I am not sure we should (today) spend too many cycles on ACL in the context of tags/labels. I wouldn't use labels and tags for ACL (today). And so everyone can upload files and tag/label them. I am 100% sure this model is way too simple, but I also think we need a more global approach to ACL anyway.
I am not sure I understand the URI idea though. Maybe you have a pointer to read up on this? Or an example.
Indeed, I read about hstore too. But the summary was: jsonb might be a bit overkill, but is easier to work with. I'll go ahead and try out hstore too in my scratchpad. Just to see if there are silver bullets and magic fairy dust we might overlook.
Ok, so hstore uses a little less storage: ~250MB compared to ~280MB for jsonb.
The query performance for "has key" is faster: ~90ms vs 140ms (jsonb). The query performance for "has key with value" is the same.
One downside of hstore is (aside from it not being json) that it comes in via a (trusted) extension.
The problem with json is 'database in a database' idiom eg. it becomes easy to stuff in a rich data structure with no schema ... given 'some period of time' things can get unruly ... hstore constrains the value ... though pg array is probably just as fast .. though both those approaches, given any kind of volume, need gin indexes (which are 'special' in their own way) - json is sure very useful but if not needed then I tend to opt for simplest abstraction. Dont let my cryptic answers put you off using em ... though at scale these differences can make themselves 'felt'.
Good points. And I guess (if needed) migrating from hstore to jsonb is trivial. While the other way is not.
So I guess the idea now is to go for hstore?! I guess we need to figure out psql extension at some point anyway.
I like this. I prefer tags over labels from a UI perspective, but could get behind whichever is easier to implement.
I updated the PR in favor of labels using hstore. While writing this up, I found that SeaORM and SQLX seem to lack support for hstore
though.
I would ask to take another look at the proposal. It should basically swap the proposal from tags to labels. And adds a bit on hstore and jsonb.
I will take a look at what it means to add hstore
support for both sqlx and seaorm. And if that's a quick change, then we could consider this as an option.
I did an update on hstore vs jsonb. TL;DR: hstore is some bigger effort, jsonb is good enough but ready to use.
So I'll let the PR open for a bit, but start working towards jsonb.
This went into one of the PR implementing labels.
Quick set of observation: tagging was implemented in both osidb and corgi ... in the end we learned a few things:
personally I like tags as a hedge against the unknown but just wait until someone asks 'I want to add my own personal tags' to items ... (ex. if uri then http://prodsec.redhat.com/users/jfuller#mycooltag)