hapifhir / hapi-fhir

🔥 HAPI FHIR - Java API for HL7 FHIR Clients and Servers
http://hapifhir.io
Apache License 2.0
2.04k stars 1.33k forks source link

Strings in HFJ_SPIDX_STRING have a maximum length of 200 #5912

Open patrick-werner opened 6 months ago

patrick-werner commented 6 months ago

Describe the bug When searching for a string with the :contains modifier or using _filter with the comodifier results are not found if the target string contains brackets

To Reproduce Steps to reproduce the behavior:

  1. http://hapi.fhir.org/baseR4/ResearchStudy?title:contains=SORATR returns a single result (expected 2)
  2. https://hapi.fhir.org/baseR4/ResearchStudy?title:contains=SORA returns 2 results (including the one with brackets surrounding the target string)

Expected behavior Both searches should return 2 results, brackets shouldn't impact the contains search.

Environment (please complete the following information):

Additional context Backup of target resources: https://gist.github.com/patrick-werner/f341a699e5df56e0365f4716f897a368

jamesagnew commented 6 months ago

That's a very long string in the source resource. My suspicion (untested) is that the brackets are a red herring, and this string is just hitting the current 200 character limit for indexing strings.

This limit is arbitrary and could be made configurable, although different RDBMSs will have different behaviours when you try to index a really long string column.

patrick-werner commented 6 months ago

Thanks for your reply @jamesagnew After some testing i can confirm the issue is caused by the 200 char index limit. Will close this issue.

patrick-werner commented 6 months ago

I modified the issue title. I have a use case where we are storing and searching ResearchStudy titles, these are often times longer then 200 chars.

I'd like to increase the max size of the column(s), but this has to be static a compile time. What do you think of removing the length attribute on the columns, transfer Max_Length to the storageSettings with a default of 200. This would enforce the index string to be maximum 200 chars long by default, but allow use-cases with longer strings. What would be you preferred way of allowing longer strings here @jamesagnew ?

jamesagnew commented 6 months ago

I don't know that I'd want to drop the length attribute - That means we'd just use the default, which I believe is 255 chars, but with unexpected changes possible if that default ever changed. And making it longer by default could cause issue in some places, not all databases like indexing long strings.

Adding a configurable max length would be fine though - You'd have to manually change the maximum length of the column in the DB, but we could add a setting to tell HAPI that this had been done.