tjake / Solandra

Solandra = Solr + Cassandra
Apache License 2.0
882 stars 150 forks source link

Solandra missing results with date range query #180

Open jayquincey opened 12 years ago

jayquincey commented 12 years ago

I am using Solandra to search for events after a certain date. To do this I index the millis since epoch (as slong data type) and use a range search like so: start:[1348992000000 TO *]

A lot of the time this works fine but sometimes there is weird buggy behavior whereby after date X, nothing is brought back but after date Y (where X < Y < date) it is returned.

After a LOT of playing around I have managed to come up with something I can consistently reproduce (on my side at least). Here are the steps to recreate:

1) Create the following schema (let me know if you want the whole xml file and I will post):

<fields>
    <field name="id" type="string" indexed="true" stored="true" required="true" />
    <field name="title" type="text" indexed="true" stored="false"/>
    <field name="start" type="slong" indexed="true" stored="false"/>
</fields>

2) Run the following Java program using SolrJ:

public class IndexTest {

    private static final SolrServer eventsServer = new HttpSolrServer("http://localhost:8983/solandra/events");

    public static void main(String... args)
    throws Exception {
        save2(1350028800000L);
        save2(1349424000000L);
        save2(1348992000000L);
        save2(1350028800000L);
        save2(1350115200000L);
        save2(1350374400000L);
        save2(1348992000000L);
        save2(1349424960000L);
        save2(1349424000000L);
        save1(1348992000000L);
        save1(1348999200000L);
        save1(1349431200000L);
        save2(1349164800000L);
        save2(1348992000000L);
        save2(1349424000000L);
        save1(1349444640000L);
        save2(1350633600000L);
    }

    private static void save1(long time)
    throws Exception {
        SolrInputDocument doc = new SolrInputDocument();

        doc.addField("id", "2ce011f0-0a80-11e2-bf94-b8f6b111caaf");
        doc.addField("title", "Test");
        doc.addField("start", time);

        eventsServer.add(doc);
        eventsServer.commit();
    }

    private static void save2(long time)
    throws Exception {
        SolrInputDocument doc = new SolrInputDocument();

        doc.addField("id", "5d9e18f0-0a80-11e2-bf94-b8f6b111caaf");
        doc.addField("title", "Test");
        doc.addField("start", time);

        eventsServer.add(doc);
        eventsServer.commit();
    }

}

3) Run the following query to see no results returned:

q=start:[1348992000000 TO *]

http://localhost:8983/solandra/events/select/?q=start:%5B1348992000000+TO+*%5D

4) Run the following query to see result is returned:

q=start:[1349049600177 TO *]

http://localhost:8983/solandra/events/select/?q=start:%5B1349049600177+TO+*%5D

Things to note:

Removing the commit() seems to fix this particular example however I have witnessed it when commits are omitted at other times. As far as I understand from comments Jake Luciani has written, commit() should have no effect so am puzzled as to why this change consistently affects results.

It only seems to happen when I have more than one event indexed (but cannot be sure as the problem seems to crop up at the most random times).

Why am I not using data type date? I did originally but thought that particular data type was causing this issue so switched. Have tried data types date, long, slong, string and text. All exhibit same sporadic missing results behavior. Please also note that switching to a different data type may fix a particular example but will be evident in others.

Have tried with Solandra code directly from github as ready-to-go Solandra as well as embedding within latest Cassandra distribution.

This is driving me crazy so ANY help or suggestions would be greatly appreciated!

Thanks,

Jay