oramasearch / orama

🌌 A complete search engine and RAG pipeline in your browser, server or edge network with support for full-text, vector, and hybrid search in less than 2kb.
https://docs.orama.com
Other
8.67k stars 291 forks source link

Search Filter not working as expected #612

Open Leo310 opened 9 months ago

Leo310 commented 9 months ago

Describe the bug

When I execute this code:

console.log('Filtering by sources', sources);
const results = await search(db, {
       where: {
            source: sources,
       },
       10000,
});
console.log('Got results', results);
}

I get the following console output:

image

As you can see the first hit in the results already has a different source attribute (value: "README.md") than the filter (value: "To Add.md"). The filter was completely ignored.

To Reproduce

  1. Initialize Orama as follows:
    
    const recordManagerSchema = {
    id: 'string',
    source: 'string',
    updated_at: 'number',
    } as const;

const db = await create({ schema: recordManagerSchema, id: "testdb", components: { tokenizer: { stemming: true, stemmerSkipProperties: ['id', 'source'], }, }, });

2. Insert some data into this db with different sources
3. execute the code I already provided in the bug description

### Expected behavior

As you stated in your docs:
![image](https://github.com/oramasearch/orama/assets/48623649/a2f13848-38a8-43f0-aec2-dea428f58a4a)
I expect to only get hits that include 'To Add.md' in their source attribute.

### Environment Info

```bash
OS: Mac 14.2.1
Orama: 2.0.1
Running inside Obsidian (electron app)

Affected areas

Search

Additional context

No response

micheleriva commented 9 months ago

Hi, would you be able to provide a repro via replit/similar tool?

Leo310 commented 9 months ago

@micheleriva Here is the repro: https://replit.com/join/xnweszxeca-leodev310 I figured out that the filter doesn't match the exact entries of sources but on substrings of one entry.

I think in my previous example, "README.md" was a hit because it contained the ".md" substring. That's why every record was matching. Not sure if this is the intended behavior. If it is, is there another way to filter on exact matches?

Leo310 commented 9 months ago

The numeric filter also doesn't seem to work as expected: For this code:

const before = 1705758397008;
console.log('Searching for entries indexed before', before, 'in Oramadb', db.data.docs);
const results = await search(await this.db, { where: { indexed_at: { lt: before } }, 100 });
console.log('Results', results);

I get this output: image where you can see that the indexed_at values of entries 21 and 22 are smaller than the value of before (1705757860535 < 1705758397008). But I also get no hits even though I am using the "lt" filter on indexed_at.

AND-TomHarris commented 3 months ago

@Leo310 did you find a solution to this? I am facing the same problem

Leo310 commented 3 months ago

@AND-TomHarris Unfortunately not, I ended up using Dexie.js instead of Orama

fenicento commented 3 months ago

+1, trying to filter for strings with spaces (e.g. "History and geography") gives back hits with substrings matching the query (e.g. every entry containing "and" in the filtered field).

Is there any workaround?

ColeDCrawford commented 3 weeks ago

Uf. This is pretty bad.

allevo commented 2 weeks ago

Hi all! Have you tried enum instead of string? Internally, Orama uses a tokenizer for strings and indexes the tokenized strings. Instead, enum values are treated as they are without manipulations.

Let me know.