Full-text search query - Githubissues

srid commented 3 years ago

Now that we have #324, enable access to it from the query feature. The following,

```query
some text
```

... should list all notes containing 'some text'. Inspired by Obsidian.

See https://github.com/srid/emanote/discussions/48#discussioncomment-955314

We could piggyback on https://github.com/EmaApps/emanote/issues/338 to implement this.

srid commented 2 years ago

Not having full-text search (including client-side search) is a dealbreaker for some projects, eg: https://github.com/hercules-ci/flake-parts/issues/31#issuecomment-1141259722

applejag commented 2 years ago

Adding stork, as suggested in https://github.com/EmaApps/emanote/pull/242#issuecomment-1100888677, was surprisingly easy. Made a proof-of-concept just to play around, and it works super well :)

Client-side of this is easy. The hard part of course it to get Haskell to talk the same language and make use of the search results during emanote gen and emanote run time.

Porting Stork to Haskell isn't realistic. However their CLI seems capable enough.

Stork always rebuild the index from scratch, which is bad for bigger sites. However just to get a perspective, here's some quick benchmarks:

Site	Indexed files	Search terms	`stork build` time
https://emanote.srid.ca/	30	5,597	~0.1s
https://input-output-hk.github.io/adrestia/	94	35,721	~0.8 to 1s
https://chenghaomou.github.io/	405	56,762	~2s

The index-building times are really good, even for the bigger repos.

And if you lock in to the idea of using Stork, then adding at least statically-built search results is a great start, and would deserve a separate ticket.

My idea of how search support would be added to Emanote:

Search support becomes opt-in.
To enable full-text search in ```query``` and web browser, user must have the stork CLI installed when building the HTML, leaving stork as an optional external integration.
On every page update (when using emanote run), Emanote runs stork build to rebuild the index.
When building the static site (via emanote gen), Emanote runs stork build at the end.
To evaluate ```query``` results, Emanote runs stork search on the prebuilt index file.

Sending pages to Stork during emanote run could be done by generating a big temporary TOML file with the content embedded in it, instead of having to sync the files to .html files all of the time, as emanote run keeps it all in memory (if I understand it correctly).

Other ideas:

Access Stork's parsing and searching algos via FFI? ("I'm feeling lucky" Google results) Would require to also add Rust compiler to Emanote's toolchain though, which would slow down build times and could be quite clumpsy.

These are just my two cents. What are your thoughts, @srid ? Maybe this was your plan all along?

srid commented 2 years ago

And if you lock in to the idea of using Stork, then adding at least statically-built search results is a great start, and would deserve a separate ticket.

I agree, and this is what we should do first (without worrying about the query stuff).

Adding stork, as suggested in https://github.com/EmaApps/emanote/pull/242#issuecomment-1100888677, was surprisingly easy.

Could you share how you did this? I imagine we can make emanote gen do it automatically.

By the way, the which library can be used to include stork as part of Emanote install.

srid commented 2 years ago

Separate ticket opened: https://github.com/EmaApps/emanote/issues/324

Let's continue the discussion there.

applejag commented 2 years ago

Could you share how you did this? I imagine we can make emanote gen do it automatically.

Yea sure:

srid commented 2 years ago

We have client-side full text search now, but to integrate it with the query feature we will need #338

applejag commented 2 years ago

We have client-side full text search now, but to integrate it with the query feature we will need #338

Using stork search -i stork.st -q "query goes here" CLI would suffice, and would probably be much easier to implement.

Using FFI could improve performance as it would skip translating back and forth between JSON, so suggest keeping it as a possible future enhancement. But the low-hanging fruit is just to use the CLI as you are when building the index.

srid / emanote

Full-text search query #102