nitrite / nitrite-java

NoSQL embedded document store for Java
https://bit.ly/no2db
Apache License 2.0
835 stars 95 forks source link

Slow performance #109

Closed dustin-auby closed 6 years ago

dustin-auby commented 6 years ago

I have found most of the database very fast but have now had to do a query with multiple AND criteria and this seems to take a big toll on performance is there anything i can do to speed it up ? the fields i am querying are mostly String.

anidotnet commented 6 years ago

Can you please post your query expressions? Please also mention what type of data each field contains and what index have you used on those fields.

dustin-auby commented 6 years ago

here is an example statement but is seems to happen on any query with more than 2 or 3 criteria. I have used a Unique index on the id and a Fulltext index on the string fields.

String search = "^(search).*";
ObjectFilters.or(ObjectFilters.or(ObjectFilters.regex("name, search), ObjectFilters.regex("surname", search)),ObjectFilters.or(ObjectFilters.regex("name, search), ObjectFilters.regex("surname", search)));
anidotnet commented 6 years ago

I have created a test case based on your description.

@Test
public void testIssue109() {
    ObjectRepository<Test109> repo = db.getRepository(Test109.class);
    Test109 data = new Test109();
    data.name = "Anindya";
    data.surname = "Chatterjee";
    repo.insert(data);

    for (int i = 0; i < 100; i++) {
        data = new Test109();
        data.name = UUID.randomUUID().toString().replace("-", " ");
        data.surname = UUID.randomUUID().toString().replace("-", " ");
        repo.insert(data);
    }

    String name = "^Anind.*";
    String surname = "^Chatt.*";
    ObjectFilter filter =
            or(
                    or(
                            regex("name", name),
                            regex("surname", surname)
                    ),
                    or(
                            regex("name", name),
                            regex("surname", surname)
                    )
            );

    long start = System.currentTimeMillis();
    Cursor<Test109> cursor = repo.find(filter);
    long diff = System.currentTimeMillis() - start;

    assertEquals(cursor.size(), 1);
    System.out.println(diff);
}

@Data
@Indices({
        @Index(value = "name", type = IndexType.Fulltext),
        @Index(value = "surname", type = IndexType.Fulltext)
})
private static class Test109 {
    private String name;
    private String surname;
}

For me it takes 1ms to query the data, which is reasonably good to me.

Normally all or() and and() clause runs in parallel and then aggregate the results, so it should not slow you down unless the field is not indexed.

But if I have missed something in the test case, feel free to modify it to reproduce your issue.

dustin-auby commented 6 years ago

Hi, here is the example i am using that is having the biggest problem. the object is quite big. the execution time i am getting to list 20 records is 802ms.


int offsett = 0;
int limit= 0;
        search = "^(" + search + ").*";
        FindOptions options = null;
        ObjectRepository repository = ((Nitrite) Database.instance.getDatabase()).getRepository(Attendant.class);
        ObjectFilter filter = ObjectFilters.and(
                ObjectFilters.and(ObjectFilters.eq("event_uid", evt.getUid()),
                        ObjectFilters.and(
                                ObjectFilters.eq("deleted", false),
                                ObjectFilters.or(
                                        ObjectFilters.or(
                                                ObjectFilters.regex("name", search),
                                                ObjectFilters.regex("company", search)
                                        ),
                                        ObjectFilters.or(
                                                ObjectFilters.regex("name", search),
                                                ObjectFilters.regex("company", search)
                                        )
                                )
                        )
                )
        );

        if (limit > 0) {
            options = FindOptions.limit(offsett, limit);
        }
        List objects;
        long start = System.currentTimeMillis();
        if(options == null){
            objects = repository.find(filter).toList();
        }else{
            objects = repository.find(filter, options).toList();
        }
        long diff = System.currentTimeMillis() - start;
        System.out.println("Search Time Diff:" + diff);

Here is the Object i am using:

@Indices({
        @Index(value = "uid", type = IndexType.Unique),
        @Index(value = "name", type = IndexType.Fulltext),
        @Index(value = "email", type = IndexType.Fulltext),
        @Index(value = "barcode", type = IndexType.Fulltext),
        @Index(value = "number", type = IndexType.Fulltext),
        @Index(value = "surname", type = IndexType.Fulltext),
        @Index(value = "company", type = IndexType.Fulltext),
})
public class Attendant implements Serializable {

    @Id
    protected String uid;
    private String event_uid;
    private String name;
    private String email;
    private String number;
    private Integer status_id;
    private String surname;
    private String company;
    private String barcode;
    private Date last_updated;
    private int banquet;
    private int concert;
    private boolean concert_draw;
    private int vip_cocktail;
    private int payment;
    private String reference;
    private boolean partner_to_banquet;
    private int additional_pax;
    private String title;
    private int registered;
    private int invited;
    private String table_number;
    private String hole;
    private Date tee_time;
    private String gender;
    private String marital_status;
    private Date dob;
    private Date admission_date;
    private String job_title;
    private String category;
    private String attendantClass;
    private String publication;
    private String source;
    private int rsvp;
    private String special_needs;
    private String dietry_req;
    private int main_event;
    private int after_party;
    private String tickets;
    private String position;
    private String row;
    private int age;
    private String extra_1;
    private String extra_2;
    private String extra_3;
    private String extra_4;
    private String extra_5;
    private String extra_6;
    private String extra_7;
    private String extra_8;
    private String extra_9;
    private String extra_10;
    private String custom_1;
    private String custom_2;
    private String custom_3;
    private String custom_4;
    private String custom_5;
    private String custom_6;
    private String custom_7;
    private String custom_8;
    private String custom_9;
    private String custom_10;
    private String custom_11;
    private String custom_12;
    private String custom_13;
    private String custom_14;
    private String custom_15;
    private String custom_16;
    private String custom_17;
    private String custom_18;
    private String custom_19;
    private String custom_20;
    private String custom_21;
    private String custom_22;
    private String custom_23;
    private String custom_24;
    private String custom_25;
    private String custom_26;
    private String custom_27;
    private String custom_28;
    private String custom_29;
    private String custom_30;
    private String note;
    private Date arival_time;
    private boolean attended;
    private boolean attended_local;
anidotnet commented 6 years ago

Please don't use

@Index(value = "uid", type = IndexType.Unique),

as you have marked uid field with @Id. Next please add indexes on event_uid and deleted fields and compare the result.

NOTE: Fulltext index is only for text with multiple words, not for all strings. If your field holds a single word only, you will be better off with NonUnique index

dustin-auby commented 6 years ago

Ok will try, which index type should i be using for those two fields. the deleted field isn't something you would typically index would it be NonUniuque or Fulltext.

anidotnet commented 6 years ago

NonUnique

dustin-auby commented 6 years ago

Ok i have made the changes and it seems to have improved the speed. the first call is still slow but any calls after that seem allot quicker

anidotnet commented 6 years ago

First call is slow because it had to spin up the threads. Any subsequent query will be faster. Closing this issue as solved.