google / android-fhir

The Android FHIR SDK is a set of Kotlin libraries for building offline-capable, mobile-first healthcare applications using the HL7® FHIR® standard on Android.
https://google.github.io/android-fhir/
Apache License 2.0
465 stars 245 forks source link

Fuzzy search on Patient name #608

Open fredhersch opened 2 years ago

fredhersch commented 2 years ago

Is your feature request related to a problem? Please describe. Patient names may be misspelled and when searching for an existing patient this may be missed leading to duplicate records fro an existing patient

Describe the solution you'd like

Describe alternatives you've considered A clear and concise description of any alternative solutions or features you've considered.

Additional context FTS comes at a storage cost. This solution will allow implementers to determine whether fuzzy search is required for their use case.

EDIT: removed references to FTS - Jing

epicadk commented 2 years ago

we could use https://www.sqlite.org/spellfix1.html Although I'm not sure if android allows creation of virtual tables.

jingtang10 commented 2 years ago

cc: @davegwatson

@vvikramraj might have some bandwidth to take this on.

f-odhiambo commented 1 year ago

@shoaibmushtaq25 will be picking this up let me know if that ok @Tarun-Bhardwaj CC @jingtang10

maimoonak commented 1 year ago

There is another ticket which aims working on token text search.

The work can either use FTS (android FTS-3/4). The FTS is powerful but specific to android with a specific set of features. The language/syntax and virtual table structure is also its own. OR As mentioned in comment in given ticket, FHIR search is very powerful and has specific rules for each param type. Moreover hapi itself has implemented the search using Lucene the use of Lucene on android SDK can help with query syntax conformance with HAPI query syntax with all features available without any customization/deviation/complexity easily.

Some aspects to evaluate are performance (Lucene claims to be faster), complexity (FTS has shorter learning curve), scalability (HAPI uses Lucence hence accommodating syntax and features is easier), size (Lucene claims to be compact but uses Files, FTS uses sqlite) etc.

I am doing a small POC on Lucene for that ticket and then we can benchmark and discuss the workable approach.

@jingtang10 @f-odhiambo @shoaibmushtaq25

shoaibmushtaq25 commented 1 year ago

I have gone through the research issue around FTS(Full-Text Search) and was first thinking about that We have a challenge that right now ResourceEntity table of the database is showing resources in JSON form in serializedResource column. So we don't have a way to index Patient.name because FTS virtual tables apply indexing on columns.

I also see a prospective solution here by @epicadk, so I am going to try this solution. I will try to change the StringIndexEntity.kt file and make this an FTS virtual table and change the respective SQL queries for string search in the MoreSearch.kt file and test it to check how it goes.

SQL cannot use an index if the argument to like starts with a wildcard. The full-text search would help here. However, the obvious drawback of FTS is the increased size of the index.

  • The use case is the string contains modifier used in string searches (don't really see any other use case), However the default tokenizer will not work for this and for some reason room says icu is not a valid tokenizer type : / .
  • FTS can be enabled like this.
  • We could just change the existing string index table to an FTS table. It wouldn't break anything it would only be an internal change in the StringIndexEntity.kt file and changing the SQL queries for string search in the MoreSearch.kt file.
  • The Performance for the contains query would be enhanced (it would be similar to using indexes for queries that start with a wildcard)

Let me know if this approach sounds okay to you or leave any feedback on it. @jingtang10 cc: @maimoonak @f-odhiambo

shoaibmushtaq25 commented 1 year ago

Moreover, FTS-5 is also available in Android to implement Full-text search with more built-in functions and enhanced features but FTS-5 has a limitation for Android API ≥ 24 so It is better to avoid FTS-5 and use FTS-3 or FTS-4(old strategies) because we might miss a lot of devices to support FTS if we go with FTS-5. Also, Room supports only FTS3 and FTS4, and it is recommended to use FTS4 in most cases. cc: @jingtang10 @maimoonak @f-odhiambo

shoaibmushtaq25 commented 1 year ago

@maimoonak, cc: @f-odhiambo @jingtang10 I explored the documentation around FTS/Virtual tables and found out that There are two possibilities for making a virtual table

  1. You can make the main table a virtual table and add content to it. This is a waste of space and resources if some columns don't need to be indexed.
  2. You can make a separate virtual table having those columns only that we need to be indexed.

For both cases 1 and 2, "All fields in an FTS entity are of TEXT affinity, except the for the 'rowid' and 'languageid' fields" as mentioned in the official doc of FTS4. Considering the above in mind, If we analyze the StringIndexEntity table, it has fields of types other than TEXT( i.e, UUID, ResourceType) as well. Also, It doesn't allow me to change the StringIndexEntity table to a FTS/virtual table and giving the following error shown in the image. Screenshot from 2022-08-23 11-00-16

shoaibmushtaq25 commented 1 year ago

we could use https://www.sqlite.org/spellfix1.html Although I'm not sure if android allows creation of virtual tables.

Yes, Android allows creation of virtual tables. But looks like, with Room DB, only FTS3 and FTS4 virtual tables are allowed. This spellfix1 virtual table can be used with simple SQLite (without Room) but we have Room Implemented for android FHIR SDK. I need to check if we can mix both of them (Simple SQLite + Room). cc: @maimoonak @jingtang10

shoaibmushtaq25 commented 1 year ago

we could use https://www.sqlite.org/spellfix1.html Although I'm not sure if android allows creation of virtual tables.

Also as per official documentation about spellfix here, it is not included in Standard SQLite Amalgamation but a loadable extension.

The spellfix1 virtual table is not included in the SQLite amalgamation and is not a part of any standard SQLite build. It is a loadable extension.

Also as per this sample project, this loadable extension needs to install NDK as well.

Other than this, I found two libraries that can be helpful to use for fuzzy search.

  1. https://github.com/tdebatty/java-string-similarity
  2. https://github.com/xdrop/fuzzywuzzy

I am not sure about fuzzy search if we go with spellfix extension loading( which also needs to install NDK) or go with the available libraries mentioned above. Needs your thoughts on that @jingtang10 @maimoonak cc: @f-odhiambo

shoaibmushtaq25 commented 1 year ago

As discussed over the call with @jingtang10 today, cc: @f-odhiambo For the following steps,

shoaibmushtaq25 commented 1 year ago

I was doing a POC on https://www.sqlite.org/spellfix1.html to load it into the Android app and use it for spell correction to facilitate fuzzy search on SDK. So far, I am able to load this https://www.sqlite.org/spellfix1.html c library into an android project after resolving the ndk-related errors (linking and loading errors) but still have to play with the https://www.sqlite.org/spellfix1.html to use its available methods/APIs for spell correction.

cc: @jingtang10 @maimoonak @f-odhiambo

shoaibmushtaq25 commented 1 year ago

As discussed in the Android FHIR SDK developers call, @jingtang10 will also be looking for other alternative options available for fuzzy search + FTS. I will have to complete the POC of spellfix extension and document the findings over here and discuss the next steps forward with @maimoonak and @jingtang10. Also, @maimoonak will be doing some more research on Lucene to check if it supports out-of-box fuzzy search in parallel to FTS. cc: @f-odhiambo

shoaibmushtaq25 commented 1 year ago

I did POC on using the https://www.sqlite.org/spellfix1.html extension into an android app and I am able to run it on the android app and use the spellfix1 table as a virtual table for spell correction search with SQLite For this POC, I had to do the following steps.

For the following steps, I will focus on working Full-text search and leave the fuzzy search to work later on after completing the full-text search. cc: @jingtang10 @f-odhiambo @maimoonak

shoaibmushtaq25 commented 1 year ago

This issue has been divided into two parts to work on.

  1. Optional support for Full-text search for Patient.name, PR for this is Ready for review #1569
  2. Optional support for Fuzzy search for Patient.name, For this I will share the Implementation plan and related LOEs

cc: @f-odhiambo @jingtang10 @maimoonak

shoaibmushtaq25 commented 1 year ago

For the implementation of part 2 of the comment above, here are the steps as mentioned above here as I did for POC of spellfix

shoaibmushtaq25 commented 1 year ago

Had a call yesterday with @jingtang10 to review PR #1569 and after having some initial review, Jing felt like we need to have more discussion with @fredhersch to understand what we want to achieve with this issue. It looks like, we are not achieving much from this issue, so we may not even need to implement this in FHIR SDK. Jing said we will also be discussing the upcoming Android FHIR SDK Developers' call around this to consolidate around what we want to achieve with this issue. @jingtang10 / @maimoonak Can you please add If I have missed something here? cc: @f-odhiambo

shoaibmushtaq25 commented 1 year ago
  1. Do more research about space taken for indexing of full-text search?

    The FTS3 table consumes around 2006 MB on disk compared to just 1453 MB for the ordinary table

  2. Can we refactor the String search API and use only the full-text search if it is feasible?

    Currently, doing some testing around the full-text search to verify if we can support all the use cases of String search API available in the Android FHIR SDK

  3. Room indexing vs Full-text search indexing.

    Indexing is almost the same for Room and FTS but the difference is that FTS supports the usage of the MATCH operator in FTS queries over virtual/fts tables which is faster than usual LIKE/pattern searches over ordinary tables

  4. How best we can support Boolean operators in Full-Text Search and check other available operators

    Boolean operators are supported in FTS5 as per the documentation, we are using FTS4 which is supported with Room and FTS4 only supports OR operator and seems like AND /NOT operators in standard query syntax and Enhanced query syntax is not supported on Android

shoaibmushtaq25 commented 1 year ago

For FTS MATCH OR/AND set operations, there are two approaches to implement this

  1. Using UNION and INTERSECT between subqueries of MATCH
    
    SELECT resourceUuid FROM StringIndexEntity c JOIN FullTextStringIndexEntity d ON c.id = d.docid
    WHERE resourceType = "Patient" AND d.index_name = "name" AND d.index_value MATCH '*' || "sharon" || '*'

{UNION/INTERSECT}

SELECT resourceUuid FROM StringIndexEntity c JOIN FullTextStringIndexEntity d ON c.id = d.docid WHERE resourceType = ? AND d.index_name = ? AND d.index_value MATCH '' || "Zoe" || ''

2. Using MATCH OR/AND in the same FTS MATCH query

SELECT resourceUuid FROM StringIndexEntity c JOIN FullTextStringIndexEntity d ON c.id = d.docid WHERE resourceType = "Patient" AND d.index_name = "name" AND d.index_value MATCH '' || "sharon OR/AND Zoe" || ''



Right now, the first approach(UNION/INTERSECT) has been implemented in [this commit](https://github.com/google/android-fhir/pull/1569/commits/077f7409f49e13b6047a253dd8894e28babc070a) but as per discussion with @maimoonak , the second approach is more performant particularly when we have a large dataset.
qaziabubakar-vd commented 1 year ago

I have currently observed that we have created a PoC for spellfix library using this sample. I need to ask if we have any guidance on whether we will be going with spellfix for the implementation of fizzy search or there have been some discussions around any other possible solutions or libraries. cc: @jingtang10 @maimoonak

qaziabubakar-vd commented 1 year ago

I have tried to integrate the spellfix extension along with room, keeping this PR as the base, and I have tried different solutions including:

Used the com.couchbase.lite:sqlcipher-android library which is the recommended way to use SQLCipher in an Android app. This library includes the necessary classes and resources for SQLCipher, but I found out that it does not include the spellfix1 extension by default.

Tried the SQLCipher library jar found online and used the libsqlcipher_android.so files for different architectures, With this solution, I tried to initialize SQLite with the spellfix1 extension using the SQLCipherSupportHelper class, but faced an error "no such module: spellfix1" which indicated that the libsqlcipher_android.so found in the SQLCipher project didn't intialize the spellfix extension properly.

So far, unfortunately, I didn't find any way to use the spellfix1 extension with the newer version of the Room Persistence Library 2.5.0 as it is not supported by default and requires significant customizations and workarounds to make it work. The Room library is designed to work with standard SQLite database, but spellfix1 is an extension that is not included with standard SQLite and it must be built and enabled separately. I followed this sample project It allows the building of a simple vocabulary table using the SQLite database but this doesn't work with a room database when I tried to re-query to check for spellings.

Currently, this seems like a blocker and given the lack of documentation and the complexity of the issue, additional research and development or looking into alternative solutions may be necessary for this. cc: @f-odhiambo @jingtang10 @maimoonak

jingtang10 commented 1 year ago

@stevenckngaa for sqlcipher

jingtang10 commented 1 year ago

@qaziabubakar do you have a branch / draft PR to share?

Briefly discussed with @stevenckngaa - from what we could see it's not clear that the spellfix extension is loaded correctly. hence "no such module: spellfix1".

jingtang10 commented 1 year ago

@aditya-07

stevenckngaa commented 1 year ago

One other possibility is the dynamic library is not included in your sample APK. Could you please use Android Studio APK analyzer (Build > Analyze...) to check if the APK contains the spellfix native library? They should be located under the lib/[CPU arch] directory.

This is the APK analysis of FHIR demo app after patching this PR. I can't locate the spellfix native library.

Screenshot 2023-02-03 at 11 56 55
qaziabubakar-vd commented 1 year ago

This is the PR for basic integration of spellfix using the native module. I updated some .so files as well and I also added thelibsqlcipher_android.so with different CPU arch under the libs folder, but got the same error, those changes are not pushed yet to this PR.

I think the issue is that spellfix isn't included in the Cipher library by default and we need to build the SQL Cipher Library and integrate the spellfix there. I needed some support on building it along with spellfix, as the documentation wasn't very clear on that.

stevenckngaa commented 1 year ago

My understanding is, when the spellfix library is correctly included, there should be a new dynamic library, called spellfix.so (or sth similar) living under the same directory hierarchy with libsqlcipher.so in the demo app apk.

stevenckngaa commented 1 year ago

I compiled this sample project and analyzed the apk. From the screenshot, you can see that there is a spellfix dynamic library called, libspellfix3.so. Could you confirm the same dynamic library is found in the relevant CPU arch folders?

Screenshot 2023-02-06 at 12 05 42
qaziabubakar-vd commented 1 year ago

I followed the integration for the sample project as well and I included the spellfix as a module in the project. Now when I compile it, I am able to view the libspellfix3.so in the lib folder but getting the same exception SQLiteException: no such module: spellfix1 when I try to create a virtual table at the time of database creation

image
stevenckngaa commented 1 year ago

Could you please link a PR with the change you have done so far so that we can look into the problem?

stevenckngaa commented 1 year ago

Thank you for coming back to me @qaziabubakar-vd.

The problem in your draft lays in the load_extension code path. In order to load a sqlite extension, you will need to make a JNI call. This issue isn't related to whether we use SqlCipher vs Android Sqlite because Android open source project doesn't expose SQLite load_extension call in SQLiteConnection.java.

I sugggest we can learn from this open source project even though it is not using SqlCipher. This library is used in the sample project you looked into for including the libspellfix3.so in the demo app binary.

There are two things we need to do there:

  1. Create a native function (JNI) for loading the extension. An example from requery project.
  2. Call the load extension JNI function right after opening the database. You will also likely need to wrap the SQLiteConnection from SqliteChiper so that the extension is loaded as part of the database open operation. An example from requery project.
jingtang10 commented 1 year ago

renamed this issue to exclude FTS which is a whole different problem.