Open andreasbaumann opened 7 years ago
Possible applications: set of strings, we don't want to work around the problem by inserting a blob like 'A B|C E' into the attribute, as we cannot use the features like using the matching attributes in a summarizer.
I will think about it.
Possible alternative:
example:
{"categories":["Hardware","NAS","Linux"]}
create a concat('|')
function and generate an attribute like:
"Hardware|NAS|Linux"
This is to consider but difficult. The segmenter treats each element of ["Hardware","NAS","Linux"] as an own segment. Currently, the only mechanisms for joining elements across segment borders are concatenation before tokenization, that is thought to be used for things like language detection, NLP, etc. and pattern matching. Only the later would be sane for this purpose. But also pattern matching would be a hack. We should consider more possibilities for assembling new elements in the analyzer.
At this point, I do not see a proper solution to the problem.
I have a field 'subject' with the value:
When splitting it with:
only the last subject is inserted into the index.
What is the strategy to have multi-valued fields?