Open anjackson opened 3 years ago
Updating is not trivial as one needs to extracts the collections for a document first, so that the next free collection-field can be determined. But I have no better idea than yours: By limiting the number of collections to 64, they could be stored in a single long
, but that would require more front end code to unpack and the number of unique values when faceting is potentially enormous.
We can store them in a long
, but I couldn't see a way to facet on bits? Maybe I missed something?
You can't facet on bits in longs (well, one could build a special processor for it, but that would be tedious to maintain). But you could post process the facet result and do the tallying of the individual collections there. But again: I prefer your solution. I'm just thinking out loud here.
Ah gotcha. And you're right, the updating will be tricky.
Currently, collections are stored as strings in multivalued fields. This has a couple of problems. Firstly, really, the string version should be translated in the UI, and we only need to store integer IDs for collections.
More importantly, the current model requires full document re-indexes if the Collections are updated. It would be better to store the collection in fields that meet the criteria for atomic, in-place updates (see In-Place Updates). This would allow collection membership to be updated without costly full re-indexing.
The main limitation is that these fields have to be single-valued. If URLs can only belong to one collection, or have a 'primary collection', then this works fine. But in general we want multiple collections, so as a workaround, we can use dynamic fields something like:
Then, at query time, we facet on all
collection_*_id_i
values (and likely have to enumerate and merge these facets client side?).This needs to be tested from the client end to check it's workable. I think we may have to enumerate all the facets separately, so in practice we'll have a limit of e.g. 6 collections an item can belong too?
EDIT The rights field
access_terms
should also be an integer rather than a string to, so this can be changed. Same for any subject fields.