A value with leading empty space break my query.

spring-projects / spring-data-elasticsearch

Provide support to increase developer productivity in Java when using Elasticsearch. Uses familiar Spring concepts such as a template classes for core API usage and lightweight repository style data access.

https://spring.io/projects/spring-data-elasticsearch/

Apache License 2.0

2.92k stars 1.33k forks source link

A value with leading empty space break my query. #2863

Closed daeho-ro closed 8 months ago

daeho-ro commented 8 months ago

I have a data, for example:

id, key, value 1, service, temp1 2, service, temp2

where the key for id = 1 is " service" which has the leading empty space and id = 2 doesn't have the leading space.

When I query the data with below code,

        CriteriaQuery searchQuery = new CriteriaQuery(new Criteria());
        searchQuery.addCriteria(Criteria.and().and("key").is(" service"));

        NativeQuery aggregationQuery = NativeQuery.builder()
                .withQuery(searchQuery)
                .withMaxResults(10000)
                .build();

        operations.search(searchQuery, MyClass.class, IndexCoordinates.of(indexName)

the result is id = 1 and id = 2.

Now, change the criteria field,

        searchQuery.addCriteria(Criteria.and().and("key.keyword").is(" service"));

then the result is id = 2 only.

How can I get only the id = 1? Or what am I missing for this case? The leading empty space break my thought and so confusing.

BTW, I am using springboot-data-elasticsearch 5.1.5. Thanks,

sothawo commented 8 months ago

This is nothing where Spring Data Elasticsearch is responsible for. When you have a text field in Elasticsearch, it is analysed when the entry is stored and again when searching, the searchg value is analysed as well. And with the default analyser setting, the text is split into tokens and space is a separator. So during analysis " service" will become the token "service" and that matches both entries. The keyword sub key is not analysed as its a keyword and no text and therefore it works there. That standard behaviour of Elasticsearch.

daeho-ro commented 8 months ago

I was expecting the second query result should be id = 1, that has the key " service" but I'v got the id = 2, "service". So, by doing keyword search with the key " service", I can only get "service" key is the standard behaviour of Elasticsearch?

edit:

As you can see, I have used .is and now changed to .match and then the query is working as expected. I think I have wrongly understand the behaviour for each methods and should dig more documents.

Thank you for the comment! :)

sothawo commented 8 months ago

2 Questions:

why do add the CriteriaQuery to a NativeQuery instead of passing it directly to the `search()' call?
why this strange constructs with the and() call instead of Criteria.where("key").is(" service")

Provide a minimal, runnable reproducable example (project, I won't build one by myself) and I can have a look at this.

daeho-ro commented 8 months ago

In my real use case, there are many fields to filter with conditions. So, I want to split them line by line.
It is good time to learn and change my code. 👍