koopjs / koop-provider-marklogic

A Koop Provider that can be used to exposed data in MarkLogic via Esri feature services
https://koopjs.github.io/koop-provider-marklogic/
Other
6 stars 11 forks source link

Feature layer from koop-provider-marklogic – Sorting of features is not working for fields with null values when limiting the number of features in query #180

Closed Lakshana01 closed 1 year ago

Lakshana01 commented 2 years ago

We are using a Feature layer from koop-provider-marklogic for our application. Features in the feature layer are being queried in lazy mode and shown in the list. So, at a time, only 15 features are obtained. On reaching the end of the list, the next 15 features are fetched. Similar way, queries are fired multiple times based on the count of features in the feature layer. The features that are fetched must be shown in either ascending or descending order of a particular field. So, on querying the feature layer itself, we have set the query parameter orderByFields value as [field_name asc]. From the query results, we noticed that when limiting the number of features to 15, the features having that specified field value as null are not sorted properly i.e., they are neither coming at the top of first query result nor at the bottom of last query result. Rather they are coming at the first of intermediate query results. This issue was not occurring when we fetch all features at one single query.

awanczowski commented 2 years ago

HI @Lakshana01,

Could you please supply some additional info?

I tried to replicate the issue using the example project with inserting null values into some records. Here is the example post with the ordering. The items with null present in the MYFIELD value will be at the end of the result set.

URL: /v1/resources/geoQueryService BODY:

{
  "params": {
    "id": "GDeltExample",
    "layer": "0",
    "method": "query"
  },
  "query": {
    "resultRecordCount": 100,
     "resultOffset" : 2000,
    "orderByFields": "MYFIELD asc",
    "nbspASC": 0,
    "returnGeometry": true
  }
}
Lakshana01 commented 2 years ago

Hi @awanczowski,

In our case, the datatype of field that is used for sorting is double. We noticed that this kind of issue doesn't occur with string type fields.

Here is the information that you have asked for,

MarkLogic Server Version: 10.0-8.1 Koop Provider Version: 1.2.0 Geo Data Service Version: 1.3.4

HTTP METHOD: POST BODY: { "params": { "id": "MyLayer", "layer": "0", "method": "query" }, "query": { "resultRecordCount": 15, "resultOffset": 0, "where": "IdField>100" "orderByFields": "AreaField ASC", "returnGeometry": 0 } }

awanczowski commented 2 years ago

I modified my documents to use double values and updated the TDE to reflect the data type. In terms of the search, the nulls are still at the end of the result set. I do see a missing comma between the where clause and orderByFields. Please confirm that this is not an issue in the executing query.

My query and TDE config are below. Does your configuration look similar?

{
    "params": {
        "id": "GDeltExample",
        "layer": "0",
        "method": "query"
    },
    "query": {
        "resultRecordCount": 15,
        "resultOffset": 150,
        "where": "OBJECTID>100",
        "orderByFields": "MYFIELD ASC",
        "returnGeometry": 0
    }
}
<column>
     <name>MYFIELD</name>
     <scalar-type>double</scalar-type>
     <val>MYFIELD</val>
     <nullable>true</nullable>
</column>
Lakshana01 commented 2 years ago

Sorry, I missed a comma in the above statement, but it was present in the query. Yes. Our query and TDE config look similar to the one you have provided. An example of document in the marklogic, query and TDE configuration are specified below. Also, the result obtained from the query executed for the third batch (start value as 30) is attached. You can notice the position of null values in the result. For your kind information, the result retrieved from second batch (start value as 15) has features with some values in the AreaField sorted in ascending order.

Snippet of document:

{
  "content": {
    "type": "Feature",
    "geometry": { "type": "Point", "coordinates": [-84.378827, 33.810205] },
    "keys": {
      "LatitudeField": "33.810205",
      "LongitudeField": "-84.378827",
      "AreaField": 4644,
      "IdField": 1
    }
  }
}

Query:

{
  "params": { "id": "MyLayer", "layer": "0", "method": "query" },
  "query": {
    "resultRecordCount": 15,
    "resultOffset": 30,
    "where": "IdField>100",
    "outFields": "AreaField",
    "orderByFields": "AreaField ASC",
    "returnGeometry": 0
  }
}

TDE:

<column>
  <name>AreaField</name>
  <scalar-type>double</scalar-type>
  <val>AreaField</val>
  <nullable>true</nullable>
</column>

Query Response file: query_response.zip

awanczowski commented 2 years ago

Hi @Lakshana01 ,

Can you please export the plan. This can be done by setting the method in the request to exportPlan

{
    "params": {
        "id": "GDeltExample",
        "layer": "0",
        "method": "exportPlan"
    },
    "query": {
        "exportPlan": true,
        "resultRecordCount": 15,
        "resultOffset": 200,
        "where": "OBJECTID>1",
        "outFields": "MYFIELD",
        "orderByFields": "MYFIELD ASC",
        "returnGeometry": 0
    }
}

Additionally, you can enable a Trace event GDS-DEBUG on the MarkLogic server. This will output additional information in the logs.

https://help.marklogic.com/Knowledgebase/Article/View/how-to-use-diagnostic-trace-events

Please attach the output and logs for both a exportPlan and query method

Additionally, on the Koop Provider side please attach the debug output. I would like to confirm the ordering request is being sent over appropriately. Here is the example of the HTTP GET call through the Koop Provider

http://localhost/marklogic/GDeltExample/FeatureServer/0/query?where=1=1&resultRecordCount=15&resultOffset=30&outFields=MYFIELD&returnGeometry=0&orderByFields=MYFIELD%20ASC

Lakshana01 commented 2 years ago

Hi @awanczowski,

I have attached the logs and output files for both exportPlan and query method. Along with it, the debug output from the koop provider is also attached. The information that you have asked for, is provided below.

HTTP call through the Koop Provider: https://mykoopserver.com/marklogic/MyLayer/FeatureServer/0/query?f=json&resultOffset=30&resultRecordCount=15&where=IdField%3e100&orderByFields=AreaField%20ASC&outFields=AreaField&returnGeometry=false&spatialRel=esriSpatialRelIntersects

Geo Query Service URL: /v1/resources/geoQueryService

It is noticed that the response from Geo Query Service is sorted properly whereas the response from the koop provider is having this issue. The result from the koop provider obtained from the query with resultRecordCount and resultOffset are set as 15 and 30 respectively, is attached for your reference.

Output & log files of exportPlan: exportplan&query.zip

Debug output from koop provider: koop_debug-output.zip

Response from koop provider: koop_response.zip

BillFarber commented 1 year ago

Hello Lakshana01,

I have been doing to some research into the issue you described and I have been able to replicate it. However, what I found is that the problem is occurring in the Koopjs code which is using a SQL statement to process the data before returning the data to the caller. I have opened an issue with Koopjs (https://github.com/koopjs/koop/issues/468#issuecomment-1443809458) to explore possible methods for resolving this issue.

I am curious: 1) Are you still using the this tool (koop-provider-marklogic)? 2) Is this still an issue for you? 3) If it is still an issue, I'm wondering about the use-case. Are you using this with ArcGIS or something like QGIS? What are you using pagination for?

Thanks

Lakshana01 commented 1 year ago

Hi @BillFarber,

  1. Yes we are still using it.
  2. Yes it is.
  3. It is used in a custom web application integrated with ArcGIS API for JavaScript.
rjrudin commented 1 year ago

@Lakshana01 Do you have a workaround in place for this in your custom web app - i.e. are you verifying that each batch of features is sorted correctly based on the column of type double before showing them to the user?

Lakshana01 commented 1 year ago

Hi @rjrudin, No. We haven't implemented such workaround in our app.

BillFarber commented 1 year ago

@Lakshana01 Since this issue is caused by the current koopjs implementation, we will continue to work with koopjs to find a solution for the issue we raised with them (https://github.com/koopjs/koop/issues/468#issuecomment-1443809458). However, we will not be able to make any changes for this issue unless/until that issue is resolved.

In doing more research, it appears that Esri services should return nulls first in sorted lists so a fix for this issue may result in all of the nulls coming first. Would that make your problem better or worse?

BillFarber commented 1 year ago

I am going to close this issue at least until either koopjs creates a way for us to solve the problem, or until we find a different work-around.