Open fredhersch opened 3 years ago
we could use https://www.sqlite.org/spellfix1.html Although I'm not sure if android allows creation of virtual tables.
cc: @davegwatson
@vvikramraj might have some bandwidth to take this on.
@shoaibmushtaq25 will be picking this up let me know if that ok @Tarun-Bhardwaj CC @jingtang10
There is another ticket which aims working on token text search.
The work can either use FTS (android FTS-3/4). The FTS is powerful but specific to android with a specific set of features. The language/syntax and virtual table structure is also its own. OR As mentioned in comment in given ticket, FHIR search is very powerful and has specific rules for each param type. Moreover hapi itself has implemented the search using Lucene the use of Lucene on android SDK can help with query syntax conformance with HAPI query syntax with all features available without any customization/deviation/complexity easily.
Some aspects to evaluate are performance (Lucene claims to be faster), complexity (FTS has shorter learning curve), scalability (HAPI uses Lucence hence accommodating syntax and features is easier), size (Lucene claims to be compact but uses Files, FTS uses sqlite) etc.
I am doing a small POC on Lucene for that ticket and then we can benchmark and discuss the workable approach.
@jingtang10 @f-odhiambo @shoaibmushtaq25
I have gone through the research issue around FTS(Full-Text Search) and was first thinking about that We have a challenge that right now ResourceEntity
table of the database is showing resources in JSON form in serializedResource
column. So we don't have a way to index Patient.name
because FTS virtual tables apply indexing on columns.
I also see a prospective solution here by @epicadk, so I am going to try this solution. I will try to change the StringIndexEntity.kt
file and make this an FTS virtual table and change the respective SQL queries for string search in the MoreSearch.kt
file and test it to check how it goes.
SQL cannot use an index if the argument to like starts with a wildcard. The full-text search would help here. However, the obvious drawback of FTS is the increased size of the index.
- The use case is the string contains modifier used in string searches (don't really see any other use case), However the default tokenizer will not work for this and for some reason room says icu is not a valid tokenizer type : / .
- FTS can be enabled like this.
- We could just change the existing string index table to an FTS table. It wouldn't break anything it would only be an internal change in the StringIndexEntity.kt file and changing the SQL queries for string search in the MoreSearch.kt file.
- The Performance for the contains query would be enhanced (it would be similar to using indexes for queries that start with a wildcard)
Let me know if this approach sounds okay to you or leave any feedback on it. @jingtang10 cc: @maimoonak @f-odhiambo
Moreover, FTS-5 is also available in Android to implement Full-text search with more built-in functions and enhanced features but FTS-5 has a limitation for Android API ≥ 24 so It is better to avoid FTS-5 and use FTS-3 or FTS-4(old strategies) because we might miss a lot of devices to support FTS if we go with FTS-5. Also, Room supports only FTS3 and FTS4, and it is recommended to use FTS4 in most cases. cc: @jingtang10 @maimoonak @f-odhiambo
@maimoonak, cc: @f-odhiambo @jingtang10 I explored the documentation around FTS/Virtual tables and found out that There are two possibilities for making a virtual table
For both cases 1 and 2, "All fields in an FTS entity are of TEXT affinity, except the for the 'rowid' and 'languageid' fields" as mentioned in the official doc of FTS4. Considering the above in mind, If we analyze the StringIndexEntity table, it has fields of types other than TEXT( i.e, UUID, ResourceType) as well. Also, It doesn't allow me to change the StringIndexEntity table to a FTS/virtual table and giving the following error shown in the image.
we could use https://www.sqlite.org/spellfix1.html Although I'm not sure if android allows creation of virtual tables.
Yes, Android allows creation of virtual tables. But looks like, with Room DB, only FTS3 and FTS4 virtual tables are allowed. This spellfix1 virtual table can be used with simple SQLite (without Room) but we have Room Implemented for android FHIR SDK. I need to check if we can mix both of them (Simple SQLite + Room). cc: @maimoonak @jingtang10
we could use https://www.sqlite.org/spellfix1.html Although I'm not sure if android allows creation of virtual tables.
Also as per official documentation about spellfix here, it is not included in Standard SQLite Amalgamation but a loadable extension.
The spellfix1 virtual table is not included in the SQLite amalgamation and is not a part of any standard SQLite build. It is a loadable extension.
Also as per this sample project, this loadable extension needs to install NDK as well.
Other than this, I found two libraries that can be helpful to use for fuzzy search.
I am not sure about fuzzy search if we go with spellfix extension loading( which also needs to install NDK) or go with the available libraries mentioned above. Needs your thoughts on that @jingtang10 @maimoonak cc: @f-odhiambo
As discussed over the call with @jingtang10 today, cc: @f-odhiambo For the following steps,
I was doing a POC on https://www.sqlite.org/spellfix1.html to load it into the Android app and use it for spell correction to facilitate fuzzy search on SDK. So far, I am able to load this https://www.sqlite.org/spellfix1.html c library into an android project after resolving the ndk-related errors (linking and loading errors) but still have to play with the https://www.sqlite.org/spellfix1.html to use its available methods/APIs for spell correction.
cc: @jingtang10 @maimoonak @f-odhiambo
As discussed in the Android FHIR SDK developers call, @jingtang10 will also be looking for other alternative options available for fuzzy search + FTS. I will have to complete the POC of spellfix extension and document the findings over here and discuss the next steps forward with @maimoonak and @jingtang10. Also, @maimoonak will be doing some more research on Lucene
to check if it supports out-of-box fuzzy search in parallel to FTS. cc: @f-odhiambo
I did POC on using the https://www.sqlite.org/spellfix1.html extension into an android app and I am able to run it on the android app and use the spellfix1 table as a virtual table for spell correction search with SQLite For this POC, I had to do the following steps.
For the following steps, I will focus on working Full-text search and leave the fuzzy search to work later on after completing the full-text search. cc: @jingtang10 @f-odhiambo @maimoonak
This issue has been divided into two parts to work on.
cc: @f-odhiambo @jingtang10 @maimoonak
For the implementation of part 2 of the comment above, here are the steps as mentioned above here as I did for POC of spellfix
Had a call yesterday with @jingtang10 to review PR #1569 and after having some initial review, Jing felt like we need to have more discussion with @fredhersch to understand what we want to achieve with this issue. It looks like, we are not achieving much from this issue, so we may not even need to implement this in FHIR SDK. Jing said we will also be discussing the upcoming Android FHIR SDK Developers' call around this to consolidate around what we want to achieve with this issue. @jingtang10 / @maimoonak Can you please add If I have missed something here? cc: @f-odhiambo
Do more research about space taken for indexing of full-text search?
The FTS3 table consumes around 2006 MB on disk compared to just 1453 MB for the ordinary table
Can we refactor the String search API and use only the full-text search if it is feasible?
Currently, doing some testing around the full-text search to verify if we can support all the use cases of String search API available in the Android FHIR SDK
Room indexing vs Full-text search indexing.
Indexing is almost the same for Room and FTS but the difference is that FTS supports the usage of the MATCH operator in FTS queries over virtual/fts tables which is faster than usual LIKE/pattern searches over ordinary tables
How best we can support Boolean operators in Full-Text Search and check other available operators
Boolean operators are supported in FTS5 as per the documentation, we are using FTS4 which is supported with Room and FTS4 only supports OR operator and seems like AND /NOT operators in standard query syntax and Enhanced query syntax is not supported on Android
For FTS MATCH OR/AND set operations, there are two approaches to implement this
SELECT resourceUuid FROM StringIndexEntity c JOIN FullTextStringIndexEntity d ON c.id = d.docid
WHERE resourceType = "Patient" AND d.index_name = "name" AND d.index_value MATCH '*' || "sharon" || '*'
{UNION/INTERSECT}
SELECT resourceUuid FROM StringIndexEntity c JOIN FullTextStringIndexEntity d ON c.id = d.docid WHERE resourceType = ? AND d.index_name = ? AND d.index_value MATCH '' || "Zoe" || ''
2. Using MATCH OR/AND in the same FTS MATCH query
SELECT resourceUuid FROM StringIndexEntity c JOIN FullTextStringIndexEntity d ON c.id = d.docid WHERE resourceType = "Patient" AND d.index_name = "name" AND d.index_value MATCH '' || "sharon OR/AND Zoe" || ''
Right now, the first approach(UNION/INTERSECT) has been implemented in [this commit](https://github.com/google/android-fhir/pull/1569/commits/077f7409f49e13b6047a253dd8894e28babc070a) but as per discussion with @maimoonak , the second approach is more performant particularly when we have a large dataset.
I have currently observed that we have created a PoC for spellfix library using this sample. I need to ask if we have any guidance on whether we will be going with spellfix for the implementation of fizzy search or there have been some discussions around any other possible solutions or libraries. cc: @jingtang10 @maimoonak
I have tried to integrate the spellfix extension along with room, keeping this PR as the base, and I have tried different solutions including:
Used the com.couchbase.lite:sqlcipher-android
library which is the recommended way to use SQLCipher in an Android app. This library includes the necessary classes and resources for SQLCipher, but I found out that it does not include the spellfix1 extension by default.
Tried the SQLCipher library jar found online and used the libsqlcipher_android.so
files for different architectures, With this solution, I tried to initialize SQLite with the spellfix1 extension using the SQLCipherSupportHelper
class, but faced an error "no such module: spellfix1" which indicated that the libsqlcipher_android.so
found in the SQLCipher project didn't intialize the spellfix extension properly.
So far, unfortunately, I didn't find any way to use the spellfix1 extension with the newer version of the Room Persistence Library 2.5.0
as it is not supported by default and requires significant customizations and workarounds to make it work.
The Room library is designed to work with standard SQLite database, but spellfix1 is an extension that is not included with standard SQLite and it must be built and enabled separately. I followed this sample project It allows the building of a simple vocabulary table using the SQLite database but this doesn't work with a room database when I tried to re-query to check for spellings.
Currently, this seems like a blocker and given the lack of documentation and the complexity of the issue, additional research and development or looking into alternative solutions may be necessary for this. cc: @f-odhiambo @jingtang10 @maimoonak
@stevenckngaa for sqlcipher
@qaziabubakar do you have a branch / draft PR to share?
Briefly discussed with @stevenckngaa - from what we could see it's not clear that the spellfix extension is loaded correctly. hence "no such module: spellfix1".
@aditya-07
One other possibility is the dynamic library is not included in your sample APK. Could you please use Android Studio APK analyzer (Build > Analyze...) to check if the APK contains the spellfix native library? They should be located under the lib/[CPU arch]
directory.
This is the APK analysis of FHIR demo app after patching this PR. I can't locate the spellfix native library.
This is the PR for basic integration of spellfix using the native module. I updated some .so files as well and I also added thelibsqlcipher_android.so
with different CPU arch under the libs folder, but got the same error, those changes are not pushed yet to this PR.
I think the issue is that spellfix isn't included in the Cipher library by default and we need to build the SQL Cipher Library and integrate the spellfix there. I needed some support on building it along with spellfix, as the documentation wasn't very clear on that.
My understanding is, when the spellfix library is correctly included, there should be a new dynamic library, called spellfix.so
(or sth similar) living under the same directory hierarchy with libsqlcipher.so
in the demo app apk.
I compiled this sample project and analyzed the apk. From the screenshot, you can see that there is a spellfix dynamic library called, libspellfix3.so
. Could you confirm the same dynamic library is found in the relevant CPU arch folders?
I followed the integration for the sample project as well and I included the spellfix as a module in the project. Now when I compile it, I am able to view the libspellfix3.so
in the lib folder but getting the same exception SQLiteException: no such module: spellfix1
when I try to create a virtual table at the time of database creation
Could you please link a PR with the change you have done so far so that we can look into the problem?
Thank you for coming back to me @qaziabubakar-vd.
The problem in your draft lays in the load_extension
code path. In order to load a sqlite extension, you will need to make a JNI call. This issue isn't related to whether we use SqlCipher vs Android Sqlite because Android open source project doesn't expose SQLite load_extension
call in SQLiteConnection.java.
I sugggest we can learn from this open source project even though it is not using SqlCipher. This library is used in the sample project you looked into for including the libspellfix3.so
in the demo app binary.
There are two things we need to do there:
renamed this issue to exclude FTS which is a whole different problem.
Is your feature request related to a problem? Please describe. Patient names may be misspelled and when searching for an existing patient this may be missed leading to duplicate records fro an existing patient
Describe the solution you'd like
Enabling it will create a new FTS index for the Patient.name[] field (TBD) and access to a search APIDescribe alternatives you've considered A clear and concise description of any alternative solutions or features you've considered.
Additional context
FTS comes at a storage cost.This solution will allow implementers to determine whether fuzzy search is required for their use case.EDIT: removed references to FTS - Jing