marklogic / java-client-api

Java client for the MarkLogic enterprise NoSQL database
https://docs.marklogic.com/guide/java
Apache License 2.0
58 stars 73 forks source link

search:and-not-query is ignored in QueryBatcher #1640

Closed marcopacurariu3 closed 4 months ago

marcopacurariu3 commented 4 months ago

Version of MarkLogic Java Client API

6.5.0

Version of MarkLogic Server

11.0.2

Java version

JDK 17

OS and version

ProductName: macOS ProductVersion: 13.5 BuildVersion: 22G74

Input: Some code to illustrate the problem

final RawCombinedQueryDefinition structQueryDef = queryManager
            .newRawCombinedQueryDefinition(new StringHandle(
                """
                    <search:search xmlns:search="http://marklogic.com/appservices/search">
                       <search:query>
                          <query xmlns="http://marklogic.com/appservices/search" xmlns:xs="http://www.w3.org/2001/XMLSchema" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
                             <search:and-query>
                                   <search:range-constraint-query>
                                      <search:constraint-name>application</search:constraint-name>
                                      <search:value>testapp</search:value>
                                      <search:range-operator>EQ</search:range-operator>
                                   </search:range-constraint-query>
                                   <search:and-not-query>
                                      <search:annotation type="searchable-collections" />
                                      <search:positive>
                                         <search:collection-query>
                                            <search:uri>workspace/testapp/someCollection</search:uri>
                                         </search:collection-query>
                                      </search:positive>
                                      <search:negative>
                                            <search:collection-query>
                                               <search:uri>workspace/testapp/ignoredCollection</search:uri>
                                            </search:collection-query>
                                      </search:negative>
                                   </search:and-not-query>
                             </search:and-query>
                          </query>
                       </search:query>
                    </search:search>
                    """), "all");

While executing a query that contains a search:and-not-query, I noticed that the QueryBatcher is simply ignoring that part. However, the same query that is executed for a simple search does work.

final SearchHandle result = queryManager.search(structQueryDef, new SearchHandle()); result.getMatchResults();

The above search works. Same query passed to a QueryBatcher and calling batcher.getItems() does not take into account the and-not-query part.

Actual output: What did you observe? What errors did you see? Can you attach the logs? (Java logs, MarkLogic logs)

Executing the above query does not take into account the and-not-query for QueryBatcher.

Expected output: What specifically did you expect to happen?

The and-not-query is taken into account for QueryBatcher.

Alternatives: What else have you tried, actual/expected?

I have found a possible solution, instead of using directly the and-not-query, I have combined a manual and with a not, and it works like this.

rjrudin commented 4 months ago

Thanks @marcopacurariu3 , we'll work on reproducing this today and will report back to you.

marcopacurariu3 commented 4 months ago

Great @rjrudin, thank you!

rjrudin commented 4 months ago

@marcopacurariu3 I haven't had luck reproducing this yet. I added a new JUnit test - https://github.com/marklogic/java-client-api/commit/acb50855c17852ccf8afe8c34759f9fe642e9b0c#diff-52e216474c59fcf9a974f2706e674b08ef0c484eee84fb74a7839b0439c99bd5R186 - that does the following:

  1. Adds the same XML doc with different URIs to collections test1 and test2.
  2. Verifies that a regular structured and-not query (with a term-query on "world", which is in both documents) works.
  3. Verifies that a combined and-not-query works.
  4. Verifies that the same combined and-not-query works with DMSDK.

I also verified that if I comment out the and-not query, then the test fails as it returns 2 documents instead of only 1.

So the and-not-query appears to be getting picked up in all cases.

Could you try modifying your query to do a term-query instead of a range-constraint-query? Looks like you could just do a term-query on "testapp". Perhaps the issue requires the use of both a range query and a set of search options.

marcopacurariu3 commented 4 months ago

Sure, I will play around and come back to you tomorrow.

Thanks a lot for digging into it.

marcopacurariu3 commented 4 months ago

@rjrudin, thanks once again for the implemented test. This helped me finding out the real issue. It works with range-constraing-query in your example, too, so it's not that.

The issue is that I am using "positive/negative" instead of "positive-query/negative-query".

So, "positive" works in a simple search, but it does not work in QueryBatcher. The QueryBatcher supports only "positive-query" or "negative-query".

Here is the query that for me it works in search, but not in batcher:

"<search xmlns=\"http://marklogic.com/appservices/search\">\n" +
                    "  <query>\n" +
                    "    <and-query>\n" +
                    "      <range-constraint-query>\n" +
                    " <constraint-name>application</constraint-name>\n" +
                    " <value>testapp</value>\n" +
                    "<range-operator>EQ</range-operator>\n" +
                    " </range-constraint-query>\n" +
                    "      <and-not-query>\n" +
                    "        <positive>\n" +
                    "          <collection-query>\n" +
                    "            <uri>test1</uri>\n" +
                    "          </collection-query>\n" +
                    "        </positive>\n" +
                    "        <negative>\n" +
                    "          <collection-query>\n" +
                    "            <uri>test2</uri>\n" +
                    "          </collection-query>\n" +
                    "        </negative>\n" +
                    "      </and-not-query>\n" +
                    "    </and-query>\n" +
                    "  </query>\n" +
                    "</search>"

I assume that if you do the same in your test, then you will notice the issue.

rjrudin commented 4 months ago

Hi @marcopacurariu3 - I checked the docs for a structured and-not-query - https://docs.marklogic.com/guide/search-dev/structured-query#id_65108 - and it does require positive-query and negative-query. I'm a little surprised the server doesn't reject positive and negative but rather ignores it.

I'm not sure why your query would work with positive and negative as I'm fairly certain those should always be ignored. I verified in my test that if I use those element names, the test fails.

I think the right approach here is to use positive-query and negative-query per the server docs link above. I'm going to close this, but please let me know if you run into issues with those elements working.