marsara9 / lemmy-search

An enhanced search engine just for Lemmy/Fediverse
https://www.search-lemmy.com
GNU Affero General Public License v3.0
81 stars 4 forks source link

[0.4.0]Exclude words #40

Closed Generator closed 1 year ago

Generator commented 1 year ago

Is your feature request related to a problem? Please describe. If i search for "cats" want to exclude posts with words "dogs"

Examples:

  1. Cats are awesome
  2. Cats play with dogs

So exclude the 2 posts from result.

Describe the solution you'd like Adding a search filter -dogs do exclude posts with that words

Additional context Similar to other search engines https://help.duckduckgo.com/results/syntax/

marsara9 commented 1 year ago

Going to try and include this in the next large release.

Note that while I should be able to support excluding certain words, I won't be able to support many of the other advanced queries without a complete overhaul of the database.

For example cats +dogs has no meaning to the current database structure and how I handle page ranking.

I also don't store much of the original text, so "cats and dogs" doesn't work either. That just gets parsed as cats and dogs (without the quotes).

Ultimately I'm not sure with many of these other advanced queries, if I will ever support them.

Generator commented 1 year ago

Exclude world is more importante that the others

marsara9 commented 1 year ago

Found a way to easily restructure the database to provide more advanced queries.

With 0.4.0 you can:

  1. "use quotes"
  2. either OR multiple-terms
  3. negate -terms

With this though, the default behavior has changed. Primarily:

  1. common terms are now automatically removed
  2. the default behavior is now an implicit AND but order doesn't matter.
  3. Searching will search for alternatives to your original query, i.e. "-ing, -s, -er" are now automatically applied. a.k.a. stemming.

Finally as a result of all of this, search speed is now SIGNIFICANTLY improved. Longer queries that used to take several seconds now return in a matter of milliseconds.