hapifhir / hapi-fhir-jpaserver-starter

Apache License 2.0
379 stars 1.02k forks source link

Bundle.entry missing in search results with Lucene index enabled #383

Closed sidharthramesh closed 3 months ago

sidharthramesh commented 2 years ago

I want to enable text search on certain token fields (identifier / phone) using the HAPI JPA FHIR server. As per the documentation here: https://hapifhir.io/hapi-fhir/docs/server_jpa/elastic.html, I've enabled certain environment variables as below.

Note: I am not using Elasticsearch.

Steps to reproduce:

Use docker image: hapiproject/hapi:v6.0.1 with database (db) postgres:14. I'm using this docker-compose.yaml file with these environment variables:

version: "3"
services:
  fhir:
    image: hapiproject/hapi:v6.0.1
    environment:
      - spring.datasource.url=jdbc:postgresql://db:5432/fhir
      - spring.datasource.username=fhir
      - spring.datasource.password=fhir
      - spring.datasource.driverClassName=org.postgresql.Driver
      - spring.jpa.properties.hibernate.dialect=ca.uhn.fhir.jpa.model.dialect.HapiFhirPostgres94Dialect
      - hapi.advanced_lucene_indexing=true
      - hapi.fhir.store_resource_in_lucene_index_enabled=true
      - mdm_enabled=true
    ports:
      - 8080:8080
  db:
    image: postgres:14
    environment:
      - POSTGRES_USER=fhir
      - POSTGRES_PASSWORD=fhir
      - POSTGRES_DB=fhir

POST a Patient resource:

 {
    "resourceType": "Patient",
    "identifier": [
        {
            "system": "hospital_id",
            "value": "1256744"
        }   

    ],
    "name": [
        {
            "text": "Sidharth Ramesh"
        }
    ],
    "gender": "male"
}

Requesting all patients in the server using GET /Patient returns

{
    "resourceType": "Bundle",
    "id": "63ec851c-e0f8-4e30-be58-382baf3e0f2a",
    "meta": {
        "lastUpdated": "2022-06-03T10:58:34.260+00:00"
    },
    "type": "searchset",
    "total": 1,
    "link": [
        {
            "relation": "self",
            "url": "https://example.com/fhir/Patient/"
        }
    ]
}

Note that the total number of resources is returned correctly, however, the entry field is empty. This only happens when hapi.fhir.store_resource_in_lucene_index_enabled is set to true.

Upon investigating further, search using an identifier using the query: Patient?identifier=1256744 returns the correct total, still without the entry field.:

{
    "resourceType": "Bundle",
    "id": "b256cb88-29e5-4fbf-bd99-d4bc4c074e30",
    "meta": {
        "lastUpdated": "2022-06-03T11:06:38.562+00:00"
    },
    "type": "searchset",
    "total": 1,
    "link": [
        {
            "relation": "self",
            "url": "https://example.com/fhir/Patient?identifier=1256744"
        }
    ]
}

However, searching using text modifier on identifier with BOTH partial subset of the string Patient?identifier:text=1256 or the complete identifier Patient?identifier:text=1256744 incorrectly returns zero total results:

{
    "resourceType": "Bundle",
    "id": "29adf7a9-52de-4bbe-8531-daa58c4029a1",
    "meta": {
        "lastUpdated": "2022-06-03T11:07:38.415+00:00"
    },
    "type": "searchset",
    "total": 0,
    "link": [
        {
            "relation": "self",
            "url": "https://example.com/fhir/Patient?identifier%3Atext=1256744"
        }
    ]
}

There are no errors on the server logs.

Expected results

  1. All the entries are returned in the search response Bundle
  2. Partial and complete string matchs are returned as results for searches on tokens using the :text modifier
michaelabuckley commented 2 years ago

Hello Sidharth. Thank you for the bug report.

First problem - missing entries. That is a bug, and I'll look into reproducing it. For now, I recommend disabling hapi.fhir.store_resource_in_lucene_index_enabled. It isn't much faster and adds no functionality.

Second problem - Patient?identifier:text=1256 doesn't match a patient with identifier 1256744. This is not a bug. The :text modifier not match the token value. Instead, it matches the display text of a Code or CodeableConcept, or the type description of an identifier. See the documentation here: https://www.hl7.org/fhir/search.html#token

E.g. Consider this resource

{
  "resourceType":"Observation",
  "code": {
      "system": "http://loinc.org",
      "code": "8480-6",
      "display": "Systolic blood pressure"
  }
}

Then /Observation?code=8480-6 and /Observation?code:text=pressure will match, but /Observation?code:text=8480 will NOT match.

This is different for string type SearchParameters. For those, the match is as you expect. If you would like to search for partial strings, you will need to define a different SearchParameter of type string indexing that value.

sidharthramesh commented 2 years ago

Hey @michaelabuckley, thank you for your reply. I created a SearchParameter with type string and it is working as expected with both phone numbers and identifiers. I'm still surprised that there's no default FHIR-native way to search this since it's a common use case.

I've now completely turned off advanced_lucene_indexing and store_resource_in_lucene_index_enabled and the search on the custom SearchParameters seems to work fine. Are these meant only for people doing full _content search and querying the display of a CodeSystem using the :text modifier?

I did refer to the documentation page again, and I think we should probably update the content under "Token search" to explicitly mention that the :text modifier works only on the display attribute and not on the value and system.

image
michaelabuckley commented 2 years ago

It is an awkward gap in the spec. You are not first person to be surprised by this. Please submit a PR to improve the documentation.

As for disabling lucene - you are correct. The advanced lucene indexing is only required for the _text, _content, :text, and :contains searching. But we have implementing support for all standard types to provide a faster query engine for users with large data sets. We have implemented support for all search parameter types to allow uses to combine full-text operations with normal search parameter queries. E.g.Patient?name:text=*ramesh*&birthdate=2020. These queries are difficult to process efficiently if we can't process all search parameters in the lucene index.

Our biggest limitation is chains/reverse-chains. These are difficult to do in lucene efficiently.

The store_resource_in_lucene_index_enabled is another experimental feature aimed at faster response time.

github-actions[bot] commented 3 months ago

This issue is stale because it has been open 730 days with no activity. Remove stale label or comment or this will be closed in 5 days.

github-actions[bot] commented 3 months ago

This issue was closed because it has been stalled for 5 days with no activity.