scambier / obsidian-omnisearch

A search engine that "just works" for Obsidian. Supports OCR and PDF indexing.
GNU General Public License v3.0
1.16k stars 57 forks source link

[BUG/Feature Request] Files with exact match aren't on top of list #289

Open amantinband opened 1 year ago

amantinband commented 1 year ago

Problem description:

Not sure if this is a bug or by design, but I would think the files starting with the substring would be on top: image

I tried playing with the various weights but couldn't get both of the "3.1."s on top.

This of course happens in various searches, not just this.

Your environment:

Things to try:

scambier commented 1 year ago

That's more of a consequence of the design: when you search for 3.1., Omnisearch splits this string in 2 tokens 3 and 1, and looks for notes that contain them. It then sorts the results with its scoring algo.

Though it should be easy to implement a score boost when the filename contains the exact query (maybe with a few constraints like >= 2 tokens or >= 5 chars)

Merlin04 commented 5 months ago

Also having this issue - trying to search for things like Theorem 2.8 gives useless results like this: image

And so I have to switch to Obsidian's search to find what I'm actually looking for: image

A boost to results that contain the exact string, or even just a mode to search for exact matches only while still letting you do fuzzy search most of the time, would be incredibly useful.

pricebaldwin commented 5 months ago

I'm interested in seeing this as well. It seems like if the first three tokens are a direct match that would be sufficient? @scambier if you want I'll look into it.

scambier commented 5 months ago

When you search for "Theorem 2.8", the tokens will be ['theorem', '2', '8'] (+ removed diacritics, + even more tokens in specific cases) , so you won't be able to do an exact string match against a document.

Kinda like we're filtering documents for parts in quotes here, we can check if some documents contain the raw query, and boost their score.

You're welcome to take a loot at it 👐

antoniak-piotr commented 1 week ago

@scambier, did you think to use some string metric algorithm, like Demarau Levenshtein distance or something similar? Here you have a few more suggestions. What do you think about it?

BTW. This may be useful as well.