matrix-org / seshat

A Matrix message database/indexer
88 stars 14 forks source link

Support the Web Browser. #84

Open kevincox opened 3 years ago

kevincox commented 3 years ago

Is your feature request related to a problem? Please describe. element-web can't search encrypted rooms.

Describe the solution you'd like seshat should work in the browser so that element-web can search encrypted rooms.

Describe alternatives you've considered

Additional context

SimonBrandner commented 3 years ago

IIRC, you can't use seshat in the browser because you can't do multi-threading. In the meantime, you can use radical-native

kevincox commented 3 years ago

I'm not an expert on the project but I understand that there are a couple of difficulties.

If the performance is nearly the same it may even make sense to use this mode for element-web on Electron.

kevincox commented 3 years ago

@SimonBrandner How hard do you think it would be to run single-threaded in the browser? And would it be completely performance prohibitive.

SimonBrandner commented 3 years ago

No idea - I have no experience in the area, tbh

HKalbasi commented 3 years ago

I tried to build with --target wasm32-wasi and it failed in dependency fs2 which is dependency of tantivy. Do you use file system in seshat? If not I will follow this issue in tantivy to conditional compile file system part.

phiresky commented 3 years ago

You don't need fs2 without memmap and then it compiles fine to wasm. Like this:

tantivy = {version="0.14.0", default-features=false, features=["wasm-bindgen"]}

Doesn't answer how to store the index though (except in memory)

phiresky commented 3 years ago

I just want to mention that

Not sure if there's any other prerequisites.

HKalbasi commented 3 years ago

@phiresky Thank you for your comments!

I tried to build with your suggestion in May 29, but it failed to compile because of EncryptedMmapDirectory and somethings similar. From comments in your PR I get that there is no simple workaround for this. Can you take a quick look at this code base and tell us how hard it would be and/or what steps we should take? I'm a matrix user and this issue really harms me but I'm not familiar with neither seshat nor tantivy to solve it.

phiresky commented 3 years ago

I'm not familiar with this code base either.. It's not easy I think, but it's not unsolveable.

it failed to compile because of EncryptedMmapDirectory

Yeah, seshat stores the data on disk in an encrypted format: https://github.com/matrix-org/seshat/blob/master/src/index/encrypted_dir.rs . That needs to be changed somehow, either just disable it for the web-browser based version or change it up. I don't think it's very smart in any case since (I think) it has to load the whole index into RAM before being able to do anything (see this line). IMO storing unencrypted would be fine but idk what the maintainer's opinion is.

I think to get it working quickly it might be easiest to disable tantivy completely for now and just use sqlite. should be good enough for most queries, though I don't know what SQLite is currently used for exactly.. Still some effort to make SQLite work, since the wa-sqlite IndexedDB VFS thing is somewhat experimental and I'm not sure how easy integrating it with rust / rusqlite would be. Would be a great advancement for browser-SQLite in any case though.


In general, it might be easier to make a simplified version of the whole seshat thing directly based on IndexedDB or some JS client-side full-text search library, I'm sure there is some. I'm not actually sure why this code base seems fairly large, since the API surface should pretty much just be search(chats: id[], text: string) -> messages[]?

yu-re-ka commented 3 years ago

Just want to let you know that I am working on this.

At first I looked into full-text search with IndexedDB, which is something I did before, but it would be limited in terms of language support, because the Intl.Segmenter API is not implemented in all browsers yet.

So instead I have changed tantivy to remove threading (to remove the need for SharedArrayBuffer support and a secure context) and it can now write and serach on an in-memory index in the web browser. Next step is to make tantivy write to IndexedDB. Then rebuild the rest of seshat in JS using IndexedDB instead of sqlite.

poljar commented 3 years ago

There are some experimental bindings for indexedDB over here: https://github.com/poljar/indexeddb-rs.

Though the next problem you'll face is that indexedDB is async while the Directory trait in Tantivy is blocking. You might get away with using block_on to bridge that gap.

As for Seshat itself, it uses a writer thread as well and the storage isn't yet abstracted away. We'll likely want to rework the Rust API of Seshat to be async and convert the writer thread to an async task.