vgteam / sequenceTubeMap

displays multiple genomic sequences in the form of a tube map
MIT License
178 stars 24 forks source link

Get Jouni's GBZ reader into WebAssembly and demo it as a tube map backend #379

Closed adamnovak closed 5 months ago

adamnovak commented 9 months ago

This is the other half of https://github.com/vgteam/sequenceTubeMap/issues/367, or the other third if we count implementing the JS wrapper.

We need to be able to build https://github.com/jltsiren/gbz-base for WebAssembly and get the result into the SequenceTubeMap build process.

adamnovak commented 9 months ago

OK, I've looked at this a bit.

On Mac, you need to use Homebrew to install rustup-init, which then installs Rust with rustup:

brew install rustup-init
rustup-init
. ~/.cargo/env

Then you need to use rustup to install the WebAssembly cross-compiling Rust standard library. It looks like the Emscripten-based one is more or less dead, and the generic one has no C standard library for sqlite, so we probably want to try and target WASI:

rustup target add wasm32-wasi

Then we can get the code:

git clone https://github.com/jltsiren/gbz-base
cd gbz-base

For rusqlite, we need to go get a C compiler for WASI. In theory Clang can get away with just headers and a standard library blob, but GCC probably can't.

wget https://github.com/WebAssembly/wasi-sdk/releases/download/wasi-sdk-20/wasi-sdk-20.0-macos.tar.gz
tar -xvf wasi-sdk-20.0-macos.tar.gz
rm wasi-sdk-20.0-macos.tar.gz
export CC_wasm32_wasi=`pwd`/wasi-sdk-20.0/bin/clang

Apparently there's a wasm32-wasi-vfs feature on rusqlite, but I don't seem to need it yet.

With that set we can cargo build --release --target=wasm32-wasi and it will build sqlite, but it fails with other problems around needing file descriptors and memory mapping in simple-sds. We need to be able to build simple-sds without its memory-mapping code for this to work. So that is probably subtask 1.

adamnovak commented 9 months ago

I think I have a fix for simple-sds. But my resulting binaries don't run because of a missing builtin implementation: https://github.com/WebAssembly/wasi-sdk/issues/361

adamnovak commented 9 months ago

OK, with 486ac7bf140a1cc8dcc4de86dbbc3e7439e3b6e0 which uses my wasm-buildable simple-sds at https://github.com/adamnovak/simple-sds/commit/8c4736fb273cd6e9bc423dad52e819410259a107, I can build with:

wget https://github.com/WebAssembly/wasi-sdk/releases/download/wasi-sdk-20/wasi-sdk-20.0-macos.tar.gz
tar -xvf wasi-sdk-20.0-macos.tar.gz
rm wasi-sdk-20.0-macos.tar.gz

CC_wasm32_wasi=`pwd`/wasi-sdk-20.0/bin/clang LIBSQLITE3_FLAGS="-DLONGDOUBLE_TYPE=double -DSQLITE_THREADSAFE=0" cargo build --release --target=wasm32-wasi

Then (assuming I also build for the host), I can make a database:

wget https://github.com/vgteam/sequenceTubeMap/raw/e70b93c291fd308e1ad718ef4104a9865214b046/exampleData/x.gbz
target/release/gbz2db --overwrite x.gbz

And with a WASM runner that supports WASI (brew install wasmtime) I can access the database:

wasmtime --dir . target/wasm32-wasi/release/query.wasm --sample "_gbwt_ref" --contig x --interval 1..10 --context 100 --distinct x.gbz.db

H   VN:Z:1.1    RS:Z:_gbwt_ref
S   1   CTTATTTG
S   2   T
S   3   C
S   4   A
...

I get what looks to be the right GFA file out.

@jltsiren is right that there is trouble with usize/u64 for reading GBZ itself though. Even with this tiny GBZ that doesn't have more than 4 billion of anything, the WASM build of the importer can't read it:

wasmtime --dir . target/wasm32-wasi/release/gbz2db.wasm x.gbz -o x.wasm.gbz.db

Loading GBZ graph x.gbz
Error: "Bit length / word length mismatch"
adamnovak commented 9 months ago

I had wanted to use the wasm-bindgen crate, which can bind Rust classes over to JS so you can use them there: https://rustwasm.github.io/docs/wasm-bindgen/reference/attributes/on-rust-exports/constructor.html

Unfortunately, you can't have it in the same project as rusqlite, because rusqlite can only build for wasm32-wasi and wasm-bindgen can only build for wasm32-unknown-unknown: https://github.com/rusqlite/rusqlite/issues/827#issuecomment-1856334163

Also, wasm-bindgen turns out whole ES modules with some JS that you import, whereas to use rusqlite we'd need to present a filesystem we control to the WASI syscall implementations form whatever WASI shim we use, meaning we'd need more control over the WASM load step than it seems like you get with wasm-bindgen.

So I think I'm going to have to write the core local implementation of each server-side function in Rust, and then export it as a C ABI function from the Rust code, which will be visible as a WASM export that JS can find and call. That should work, though I might need to so some !!fun!! things for strings?

adamnovak commented 6 months ago

I have this working now in 9a7d5ffcaf47cf28d71560e1ba13e0c27f4acb9f. I'm using a WASI build and just invoking the CLI query command and getting standard output, which it conveniently fills with JSON I can understand.

I'm running the WASM in a web worker, and fetching bits of the data it needs synchronously with FileReaderSync against slices of a Blob sent over from the page.