[BUG/Feature Request] Files with exact match aren't on top of list

scambier / obsidian-omnisearch

A search engine that "just works" for Obsidian. Supports OCR and PDF indexing.

GNU General Public License v3.0

1.16k stars 57 forks source link

[BUG/Feature Request] Files with exact match aren't on top of list #289

Open amantinband opened 1 year ago

amantinband commented 1 year ago

Problem description:

Not sure if this is a bug or by design, but I would think the files starting with the substring would be on top:

I tried playing with the various weights but couldn't get both of the "3.1."s on top.

This of course happens in various searches, not just this.

Your environment:

Omnisearch version: 1.17.1
Obsidian version: 1.4.11
Operating system: MacOS
Number of indexed documents in your vault (approx.): 1000

Things to try:

Does the problem occur when Omnisearch is the only active community plugin: yep
Does the problem occur when you don't index PDFs, images, or other non-notes files: yep
Does the problem occur after a cache reset: yep

scambier commented 1 year ago

That's more of a consequence of the design: when you search for 3.1., Omnisearch splits this string in 2 tokens 3 and 1, and looks for notes that contain them. It then sorts the results with its scoring algo.

Though it should be easy to implement a score boost when the filename contains the exact query (maybe with a few constraints like >= 2 tokens or >= 5 chars)

Merlin04 commented 5 months ago

Also having this issue - trying to search for things like Theorem 2.8 gives useless results like this:

And so I have to switch to Obsidian's search to find what I'm actually looking for:

A boost to results that contain the exact string, or even just a mode to search for exact matches only while still letting you do fuzzy search most of the time, would be incredibly useful.

pricebaldwin commented 5 months ago

I'm interested in seeing this as well. It seems like if the first three tokens are a direct match that would be sufficient? @scambier if you want I'll look into it.

scambier commented 5 months ago

When you search for "Theorem 2.8", the tokens will be ['theorem', '2', '8'] (+ removed diacritics, + even more tokens in specific cases) , so you won't be able to do an exact string match against a document.

Kinda like we're filtering documents for parts in quotes here, we can check if some documents contain the raw query, and boost their score.

You're welcome to take a loot at it 👐

antoniak-piotr commented 1 week ago

@scambier, did you think to use some string metric algorithm, like Demarau Levenshtein distance or something similar? Here you have a few more suggestions. What do you think about it?

BTW. This may be useful as well.