sematext / solr-redis

Solr Redis Extensions
http://sematext.com/engineering/index.html#open-source
Apache License 2.0
52 stars 22 forks source link

No results for integer fields #48

Open lstrojny opened 3 years ago

lstrojny commented 3 years ago

Given this field in a schema:

<field name="field" type="int" indexed="true" stored="true" required="false" multiValued="true"/>

And some documents in the index with 2 in the multi-valued field

And this set in redis:

SADD myset 2

This query does not return any documents but should:

{!redis command="smembers" key="myset"}field
radu-gheorghe commented 3 years ago

I don't think the problem is the multi-valued field, but the int field. Here are some sample documents:

      {
        "id":"1",
        "field_i":2},
      {
        "id":"2",
        "field_s":"2"},
      {
        "id":"3",
        "field_ss":["2"]},
      {
        "id":"4",
        "field_ss":["2",
          "3"]}]

This is the default schema, so:

With the set that you mentioned above, I don't get anything with fq={!redis command=SMEMBERS key=myset}field_i, even if it's a single_valued field. I do get document 2 if I query field_s like fq={!redis command=SMEMBERS key=myset}field_s and I get docs 3 and 4 if I query field_ss in the same way.

So multi-valued seems to work, but not integers (btw, my "integer" is <fieldType name="pint" class="solr.IntPointField" docValues="true"/>). Do you need support for integer values? Because I'm not sure how we deal with data types (and potential number formatting exceptions) yet, but I can check.

lstrojny commented 3 years ago

Hah, interesting. This exact query previously worked, any idea what changed wrt integer handling? Could that be a difference e.g. in the Jedis library?

radu-gheorghe commented 3 years ago

I don't know, but my intuition leans more towards the way we're building the query for Solr. Because the set isn't defined as integer, right?

Let me know if you'd like me to look deeper into it, if you need the support for integers.

lstrojny commented 3 years ago

Just to be clear: the issue popped up in our test suite when running it against Solr 8 with the new version of the extension, so the behavior definitely changed as to the previous version. Please have a deeper look into it, we restored integer support.

radu-gheorghe commented 3 years ago

OK, I'll do that (most likely next week, sorry for the delay) and come back to you.

radu-gheorghe commented 3 years ago

I just added a new release which should fix this issue. All numeric fields should work now (potentially also other point fields, such as geo).

Some more details:

Here's an example where the set contains an a and field_is is a multiValued integer field:

      "error-class","org.apache.solr.common.SolrException",
      "root-error-class","org.apache.solr.common.SolrException"],
    "msg":"Invalid Number: a for field field_is",
    "code":400

Last but not least I've added a unit test for integer support, which required upgrading Mockito (or at least that was the cleanest way I saw, since we're mocking SchemaField, which is a final class and that didn't work before).

I'll leave it up to you to close this issue, if you feel like we're done here. Otherwise, please let me know if you'd like to see other improvements.

lstrojny commented 3 years ago

Tested this again, here is the setup:

schema.xml

<?xml version="1.0" encoding="UTF-8" ?>
<schema name="test" version="1.4">
    <types>
        <fieldType name="int" class="solr.TrieIntField" precisionStep="8" omitNorms="true" positionIncrementGap="0"/>
    </types>
    <fields>
        <field name="documentId" type="int" indexed="true" stored="true" required="true"/>
    </fields>
    <uniqueKey>documentId</uniqueKey>
</schema>

Added these two documents:

{"documentId": 1}
{"documentId": 2}

Set up redis this way:

redis-cli
127.0.0.1:6379> sadd field 1

Used this query:

{!redis command=smembers key=field}documentId

Expectation: document with ID 1 is returned Actual: empty search result

Confirmed that:

Now if I pass useAnalyzer=true, meaning {!redis command=smembers key=field useAnalyzer=true}documentId, document of ID 1 is returned.

Update

Reading comprehension skills provide a clear advantage to those who have them, in this case myself 😄

After reading your response again, I noticed that indeed we are using trie fields all over the place. With point fields it indeed works. So I'll try addressing this issue in our schema for now.

radu-gheorghe commented 3 years ago

Thanks for testing and re-reading :) I didn't know that it works with useAnalyzer=true with trie fields. But yes, if you can address it in the schema it would be great, as trie fields are deprecated and there's no advantage to using them instead of point fields (at least that I know of). Something like this should work:

<fieldType name="pint" class="solr.IntPointField"/>

You may or may not want to add docValues="true" to the definition. docValues are needed for sorting and facets, but may also be used to speed up some queries (the getFieldQuery() I mentioned would generate an IndexOrDocValuesQuery if the field is both indexed and has docValues).