Open lstrojny opened 3 years ago
I don't think the problem is the multi-valued field, but the int
field. Here are some sample documents:
{
"id":"1",
"field_i":2},
{
"id":"2",
"field_s":"2"},
{
"id":"3",
"field_ss":["2"]},
{
"id":"4",
"field_ss":["2",
"3"]}]
This is the default schema, so:
*_i
are single-valued integers*_s
are single-valued strings*_ss
are multi-valued stringsWith the set that you mentioned above, I don't get anything with fq={!redis command=SMEMBERS key=myset}field_i
, even if it's a single_valued field. I do get document 2
if I query field_s
like fq={!redis command=SMEMBERS key=myset}field_s
and I get docs 3
and 4
if I query field_ss
in the same way.
So multi-valued seems to work, but not integers (btw, my "integer" is <fieldType name="pint" class="solr.IntPointField" docValues="true"/>
). Do you need support for integer values? Because I'm not sure how we deal with data types (and potential number formatting exceptions) yet, but I can check.
Hah, interesting. This exact query previously worked, any idea what changed wrt integer handling? Could that be a difference e.g. in the Jedis library?
I don't know, but my intuition leans more towards the way we're building the query for Solr. Because the set isn't defined as integer, right?
Let me know if you'd like me to look deeper into it, if you need the support for integers.
Just to be clear: the issue popped up in our test suite when running it against Solr 8 with the new version of the extension, so the behavior definitely changed as to the previous version. Please have a deeper look into it, we restored integer support.
OK, I'll do that (most likely next week, sorry for the delay) and come back to you.
I just added a new release which should fix this issue. All numeric fields should work now (potentially also other point fields, such as geo).
Some more details:
solr.IntPointField
). This is the default for a while now. It may work with trie fields, too (I didn't test yet - you may have them in old schemas), but these are deprecated and slowerBooleanQuery
BooleanQuery
) implies that searching on large sets of numeric fields can be slower than doing the same on string fields - if you don't need scoring like here: fq={!redis command=SMEMBERS key=myset ignoreScore=true}field_s
. That's because in the latter case we use TermInSetQuery
(a.k.a "terms" query) which should work better on lots of termsHere's an example where the set contains an a
and field_is
is a multiValued integer field:
"error-class","org.apache.solr.common.SolrException",
"root-error-class","org.apache.solr.common.SolrException"],
"msg":"Invalid Number: a for field field_is",
"code":400
Last but not least I've added a unit test for integer support, which required upgrading Mockito (or at least that was the cleanest way I saw, since we're mocking SchemaField
, which is a final class and that didn't work before).
I'll leave it up to you to close this issue, if you feel like we're done here. Otherwise, please let me know if you'd like to see other improvements.
Tested this again, here is the setup:
schema.xml
<?xml version="1.0" encoding="UTF-8" ?>
<schema name="test" version="1.4">
<types>
<fieldType name="int" class="solr.TrieIntField" precisionStep="8" omitNorms="true" positionIncrementGap="0"/>
</types>
<fields>
<field name="documentId" type="int" indexed="true" stored="true" required="true"/>
</fields>
<uniqueKey>documentId</uniqueKey>
</schema>
Added these two documents:
{"documentId": 1}
{"documentId": 2}
Set up redis this way:
redis-cli
127.0.0.1:6379> sadd field 1
Used this query:
{!redis command=smembers key=field}documentId
Expectation: document with ID 1 is returned Actual: empty search result
Confirmed that:
*:*
query returns the two solr documentsredis-cli
+ monitor
to confirm that the expected redis command is executed (SMEMBERS field
)SMEMBERS field
returns a set with a single member of value 1
Now if I pass useAnalyzer=true
, meaning {!redis command=smembers key=field useAnalyzer=true}documentId
, document of ID 1 is returned.
Reading comprehension skills provide a clear advantage to those who have them, in this case myself 😄
After reading your response again, I noticed that indeed we are using trie fields all over the place. With point fields it indeed works. So I'll try addressing this issue in our schema for now.
Thanks for testing and re-reading :) I didn't know that it works with useAnalyzer=true
with trie fields. But yes, if you can address it in the schema it would be great, as trie fields are deprecated and there's no advantage to using them instead of point fields (at least that I know of). Something like this should work:
<fieldType name="pint" class="solr.IntPointField"/>
You may or may not want to add docValues="true"
to the definition. docValues are needed for sorting and facets, but may also be used to speed up some queries (the getFieldQuery()
I mentioned would generate an IndexOrDocValuesQuery if the field is both indexed and has docValues).
Given this field in a schema:
And some documents in the index with
2
in the multi-valued fieldAnd this set in redis:
This query does not return any documents but should: