realm / realm-core

Core database component for the Realm Mobile Database SDKs
https://realm.io
Apache License 2.0
1.02k stars 163 forks source link

RealmDB Full Text Search not finding records in big DB #7368

Closed tn7111 closed 6 months ago

tn7111 commented 8 months ago

Expected results

There's FullText index on content field of my DB. There's a record there: "18x48 18x48x80 white 18 x 48 80 prm 1848 15" It should be able found by request content TEXT "prm 18"

Actual Results

request yields no results even though I have more than 1 record that satisfies request

Steps & Code to Reproduce

I tried through both .NET SDK through Unity & Realm Studio. Same results. Even if I set record's content field to 'prm 18' directly search does not return it. If I try searching content TEXT "prm" everything works as expected

Core version

Core version: Unity 11.5.0 (core 13.20.1 I think (the CHANGELOG says x.y.z here, sorry)). But I've tried with the latest Realm Studio, so I guess it uses some later core version.

sync-by-unito[bot] commented 8 months ago

➤ PM Bot commented:

Jira ticket: RCORE-1989

nirinchev commented 8 months ago

The FTS implementation matches on full words, so I guess it treats 1848 as a word and doesn't match 18 there. I haven't tried it, but you could try adding * to make it a prefix search.

tn7111 commented 8 months ago

take a look. there's also just 18earlier in the string.

tn7111 commented 8 months ago

Another thing worth noting. I tried recreating the DB with limited record number (around 300). Search worked correctly. And when I try my original db file which has 60k entries, search fails as described.

nirinchev commented 8 months ago

Hm, good point. I guess @jedelbo that's in your area of expertise. @tn7111 not sure if that'd be possible, but if you could give us access to the database where search fails, that would make it a lot easier to find the root cause.

tn7111 commented 8 months ago

mm, I'll think of a way to obfuscate data maybe... not sure though. I wonder, @nirinchev @jedelbo is there a way to retrive actual index somehow?

jedelbo commented 8 months ago

@tn7111 There must be some weird combination of words that somehow tricks the index. If there is any chance that you can produce a .realm file that exhibits the problem with data you can share, you can send it privately to me at jorgen.edelbo@mongodb.com.

jedelbo commented 8 months ago

If you can build realm core locally, this c++ program (modified appropriately) can dump the index.

#include "realm.hpp"
#include <iostream>

using namespace realm;

int main()
{
    DBRef db = DB::create(make_in_realm_history(), "test.realm");
    auto wt = db->start_read();
    auto table = wt->get_table("table");
    auto col = table->get_column_key("text");
    table->get_search_index(col)->do_dump_node_structure(std::cout, 0);
}
jedelbo commented 8 months ago

You need to build in DEBUG mode.

tn7111 commented 8 months ago

Wow! That's a lot of help. Working now on recreating .realm file. It seems like the size of the DB does not matter. I guess it's just something about the index. I have now 2 similar .realm files (which I cannot share for now, since this is production data). The number of entries is about the same. The new one works correctly. The old one fails as described. The entries and their respective fields I query are the same.

github-actions[bot] commented 6 months ago

This issue has been automatically closed because there has been no response to our request for more information from the original author. With only the information that is currently in the issue, we don't have enough information to take action. Please reach out if you have or find the answers we need so that we can investigate further.