hackmdio / codimd

CodiMD - Realtime collaborative markdown notes on all platforms.
https://hackmd.io/c/codimd-documentation
GNU Affero General Public License v3.0
9.24k stars 1.05k forks source link

Full text search #482

Open almereyda opened 7 years ago

almereyda commented 7 years ago

please mark as feature request

As a logged in user, I am presented with recently opened pads when visiting the splash page. These are custom to the user client and stored in local storage, as far as I know. There is no known synchronisation with the HackMD server.

This issue tries to anticipate what would be needed to implement full text search over pads.

Given the introductory assumptions, a search service would allow for POSTing a list of pads to crawl, index and query by the frontend client, which is the only source of lists of pads available.

xorander00 commented 7 years ago

PostgreSQL has pretty good full-text search (tsvector & tsquery). You can create the appropriate index and then start querying away.

I'd suggest looking into using the zombodb extension for PostgreSQL. It transparently integrates PostgreSQL with ElasticSearch.

ccoenen commented 7 years ago

ZoomDB would involve Java as a dependeny, would it not? At most, this should be an optional dependency. Especially for small instances, Postgres' and MySQLs full text search would probably yield usable results.

xorander00 commented 7 years ago

It's an extension for Postgres, so wouldn't be referenced in this project. I agree that it should be optional, but I've also changed my opinion on it. While it's a nice extension, it's pretty straightforward to just use the native full-text search support in Postgres (FTS). ZoomDB currently only supports ES 1.7.x, and I use 5.

If it were me, I'd make ES support an optional feature. The decision at that point, when it's being implemented, would be how best to keep the relevant data in sync between Postgres & ES (or leave it up to the sys admin and simply change the back-end that's being queried). I'd advocate for solely using FTS, but the only thing that gives me pause is the Postgres instance being hit by every user for real-time search queries while they're typing the text into the search bar. The RxJS change that I had mentioned in #531 would help with this by implementing backpressure/throttle/debounce while the user is typing, but I'd still be cautious about hitting the server with a ton of requests, hence the ES usage.

romainreuillon commented 5 years ago

This feature would be awesome!

joenio commented 2 years ago

+1