Closed adamnovak closed 4 months ago
I could make memory mapping a feature that is enabled by default but can be disabled. However, the bigger issue is that simple-sds leans heavily on the assumption that usize
and u64
are the same. Many low-level things will probably break with 32-bit integers.
Additionally, Rust uses usize
for array indexing, which means that it's difficult to use arrays larger than 2^32 in a 32-bit environment. We can't import GBZs with more than ~4.29 Gbp of sequence, such as human graphs built with PGGB or full Minigraph–Cactus graphs. We also can't import GBZs where the run-length encoded BWT is larger than 4 GB, such as 1000GP graphs (and possibly final HPRC graphs with 700+ haplotypes).
We might be able to get away with the max size limitations. I was thinking we'd convert from GBZ to database outside the browser, so all we really need is to be able to properly decode the blobs in the database files.
And if we want to use the databases in the browser, and if we need simple-sds to decode the blobs in the databases, then I don't know if there's an alternative to painstakingly unwinding the assumption that usize
is u64
in the code that actually implements the data structures.
I managed to get simple-sds to build for wasm32-wasi
with liberal use of #[cfg(not(target_family = "wasm"))]
. Hopefully once I can get the full gbz-base
binaries to link and load right I can start identifying places where the two builds can't agree on serialized representations.
I don't think gbz-base will need anything from simple-sds once the database has been built. The sizes and identifiers of individual objects should fit in 32 bits, because we often do that in vg as well. The blobs are encoded either using gbwt::support::ByteCode
/ gbwt::support::RLE
, which don't care about the size of usize
as long as the numbers fit in it, or an internal encoding that packs three bases in a byte.
I think PR #18 also resolved this.
For https://github.com/vgteam/sequenceTubeMap/issues/379 I'm trying to get
gbz-base
to build for WebAssembly. But it doesn't at the moment, because simple-sds can't. Here's the first 8 errors it throws up:I think I need to:
std::os::unix
-isms that can be replaced with things just instd::os
, and make any other ones optional somehow.