Closed sffc closed 3 years ago
I've been successfully using https://crates.io/crates/bincode for that purpose in my experimental datetime impl - https://github.com/zbraniecki/unic-datetime
Not sure what are the pros/cons of either method, but the numbers in my measurements use bincode vs. json (both serde).
Since those two are very much alike, I hope we'll be able to load either JSON or bincode resources provided with the library.
Separately, we could also make it possible to produce an .rs
file that can store the structs, here's an example of a polish datetime CLDR data generated from JSON into a Rust file: https://github.com/zbraniecki/unic-datetime/blob/master/src/data/generated/pl.rs
What are the pros and cons of using something like bincode or CBOR versus a .rs
file of structs?
Pros:
unic-datetime
benchmarkCons:
My thinking was that this would be useful for scenarios where we want to have some basic data to fallback on when dynamically loaded data fails, data changes very rarely, and the crate may be useful with that little chunk of data available at all times.
Examples: list of RTL languages for directionality.
Is there a way to auto-generate .rs
files from a JSON file conforming to Serde? Maybe using a proc macro?
I don't know of one, but it should be relatively easy to write one.
I must admit the method I used is very ugly, and I wouldn't recommend it. I basically println!
the Rust file.
If I were to redo it, I'd likely use quote
to produce Rust or write my own serde serializer to rust structs.
@Manishearth - do you know of anyone who's done it before? Any libs?
For specific structs you should probably write your own deserializer. I don't think there's anything for that in general, but it can be written.
Actually, I'd just parse it into a hashmap and operate on that. No need to write a deserializer, those are tricky.
I'm surprised that there isn't apparently an existing crate to generate build-time Rust structures in static memory from a JSON file.
I guess we should either:
I agree. It would be sweet to have a serde plugin that can output Rust. I think the challenge is that what Rust it outputs may be project dependent, but I'm comfortable delaying that experiment until we have some serde output for CBOR/bincode.
I've written some manual serde deserializers, so may dive into it one day :)
I'm unassigning myself. For the time being, we can use a JSON string or CBOR/bincode. In the medium term, we should have the auto-generated Rust code, and we should evaluate to what degree that is faster than deserializing JSON/CBOR/bincode at runtime.
Could be an intern project, or could be something good for the Mozilla team, since it is more closely related to the Rust ecosystem and likely wants to be its own project.
Moved to backlog.
@zbraniecki saw this post on Reddit: https://www.reddit.com/r/rust/comments/gud6c1/introducing_treebuf/
As part of #116 I started looking into performance of various resource types based on https://github.com/unicode-cldr/cldr-core/blob/master/supplemental/plurals.json and https://github.com/unicode-cldr/cldr-core/blob/master/supplemental/ordinals.json .
The branch I work on supports parsing PluralRules at runtime (I expect to add ability to load inlined data as well) either from inlined strings, or from JSON, Bincode and CBOR.
I used my harness from intl-measurements using runtime (not criterion) to emulate the cost of loading data to just format 31 numbers in 10 locales. The data is totally not optimized, so this data is only valuable to compare JSON/Bincode/CBOR, not representative of what perf I expect out of PluralRules when the PR is ready.
I also compared the size of resource files for cardinals.
Type | Time | Size |
---|---|---|
Inline String | 39344 ns | ? |
Bincode | 137845 ns | 57k |
CBOR | 172867 ns | 59k |
JSON | 201755 ns | 69k |
My conclusion is that CBOR is slower than Bincode in deserialization and Bincode gives us smaller resource files (although only slightly)
Agreed that Bincode looks nicer based on this initial data.
What are the runtime numbers for exactly? Are you parsing all of the cardinal rules for all locales? 137845 ns is faster than CBOR and JSON, but still really slow by absolute value.
Here's the code: https://github.com/zbraniecki/intl-measurements/blob/master/unic/pluralrules/src/main.rs#L52-L61
So, it's 10x loading, parsing, and resolving for 31 sample numbers.
OK, thanks for that.
.select()
function should not be affected by which strategy is used for data serialization?PluralRules::create
by itself, including loading from a static string, so that we can better break down these numbers into what part of the stack they originate from:
.select()
- Is it correct to say that the runtime for the actual .select() function should not be affected by which strategy is used for data serialization?
Yes.
- It would be good to get data for PluralRules::create by itself, including loading from a static string, so that we can better break down these numbers into what part of the stack they originate from.
The inline string
data point is for when there's no I/O and no parsing Resource
.
I measured just selecting for 10x31. It takes ~8000 ns
.
Parsing the rule string for 10x31 is ~30000 ns
.
So, substracting that shared piece from my numbers, loading the data into that structure: https://github.com/zbraniecki/icu4x/blob/pluralrules/components/pluralrules/src/data/json.rs#L12-L26
takes:
Also, as I said in my earlier post - I have a number of low hanging fruits for perf improvements in parsing and selecting. The 30000ns of 10 locales x 2 rules (cardinal/ordinal) (which means ~1500ns per rule) and 8000ns of 10x31x2 in selection (which means 13ns per selection) is still a subject to further optimization.
The manual parser I wrote there is ~3 times faster than nom based parser generator we used in https://github.com/zbraniecki/pluralrules but it still can be faster :)
My comment here is mostly about the conclusion about JSON/Bincode/CBOR loading/deserializing using serde/bincode. It's very consistent with my results for DateTime between JSON/Bincode/Inline - https://github.com/zbraniecki/intl-measurements/ (30us slower)
Moving the PluralRules-specific stuff to #130
I found two crates that may be able to generate rust - https://lib.rs/crates/serde-generate and https://github.com/udoprog/genco
We should use StaticDataProvider in example code so that the examples aren't dependent on the filesystem machinery.
Consider whether rkyv can be used here: https://davidkoloski.me/rkyv/rkyv.html
Here's what I'm currently thinking.
Sticking with a serialization/deserialization framework seems like the most robust and future-proof option. Rust code generation is nice, but there are problems with it, such as wanting to share data across multiple ICU4X instances.
I'm thinking we should take a two-layered approach to the static (filesystem-less) serialization:
Here's what I mean. On the filesystem, the resource paths are encoded into the filename. However, the schema defining what's inside of those files is not known until the data is requested. I think we should do the same thing with static data: the main data structure maps from resource paths to opaque buffers, and then when the data is requested, we map from the opaque buffer to the specific data struct.
Advantages of this model:
Disadvantages:
Zibi linked the following benchmark comparison between deserialization frameworks: https://davidkoloski.me/blog/rkyv-is-faster-than/
Postcard (https://docs.rs/postcard/0.5.2/postcard/) looks quite promising, as well as rkyv
.
@Manishearth said he'll take on investigating postcard as a replcament for bincode and rkyv as an option for static.
I did some experiments with Rkyv, Bincode, and Postcard on LiteMap, mainly focusing on code size. I compressed the example data in language_names_lite_map.rs into each of the three binary serialization formats under consideration.
Bincode example:
const BYTES: [u8; 278] = [ ... ];
let map: LiteMap<&str, &str> = bincode::deserialize(&BYTES).unwrap();
Postcard example:
const POSTCARD: [u8; 117] = [ ... ];
let map: LiteMap<&str, &str> = postcard::from_bytes(&POSTCARD).unwrap();
Rkyv example:
const BYTES: Aligned<[u8; 280]> = Aligned([ ... ]);
let archived = unsafe { archived_root::<LiteMap<String, String>>(&BYTES.0) };
let deserialized = archived.deserialize(&mut AllocDeserializer).unwrap();
Comparison:
Bincode | Postcard | Rkyv | |
---|---|---|---|
Data Size | 278 B | 117 B | 280 B |
Deserializer Code Size | 24449 B | 25923 B | 8052 B |
So Postcard wins on data size, and Rkyv on code size. Most of the code size in both Bincode and Postcard is Serde error handling stuff.
Here are some caveats of Rkyv. UPDATE: I did some more experimenting and am updating the caveats that I had posted earlier.
The crucial architectural design difference between Serde and Rkyv is in how it deals with deserialization types. In Serde, the data struct has a lifetime parameter, and zero-copy deserialized data gets referenced from the data struct. However, in Rkyv, zero-copy deserialized data does not go into the main data struct; it goes into a special intermediate struct with the same fields as the main struct but wrapped with RelPtr
.
Rkyv supports serializing data structs that have a lifetime parameter, but doesn't appear to support deserializing into them; you must use the Archive
type with RelPtr
instead of the real data struct.
I don't know what this would mean for the data provider pipeline, which up to this point has always assumed that the original data struct type is the one being passed around. The only solution I can think of right now is to make data struct types generic in formatters, such that something like DateTimeFormat can own either the plain data struct or the Rkyv version of the data struct.
Have you manage to use the release from 8h ago of rkyv - https://github.com/djkoloski/rkyv/releases/tag/v0.5.0 ? If not, would it be hard to repro with it?
Also, seems like the author is seeking feedback for their roadmap - https://github.com/djkoloski/rkyv/issues/67
Yes, actually I do believe I was using version 0.5 in my testing. I picked whatever was the latest version on crates.io when I started, which was probably 30 minutes after it was pushed. :smiley:
Thanks for the link; I'll try to organize my thoughts into suggestions on Rkyv.
It's a new day and I'm having trouble reproducing the results I posted last night. Here is what I'm getting this morning for deserializer code size:
Bincode | Postcard | Rkyv |
---|---|---|
24083 B | 11274 B | 6913 B |
Will continue investigating.
Some more data:
Bincode | Postcard | Rkyv (Deserialize) | Rkyv (Archive) | |
---|---|---|---|---|
Code Size | 24083 B | 11274 B | 6913 B | 463 B |
Data Size | 278 B | 117 B | 280 B | 280 B |
Max Heap Size | 352 B | 352 B | 629 B | 0 B |
Instruction Count | 121228 Ir | 129691 Ir | 126651 Ir | 2984 Ir |
Looking inside the WebAssembly bytecode:
More background on the tests:
LiteMap<&str, &str>
LiteMap<String, String>
You can see the code here:
https://github.com/sffc/omnicu/tree/static-testing/utils/litemap/benches/bin
@zbraniecki I posted some thoughts in https://github.com/djkoloski/rkyv/issues/67. Feel free to add more as you see fit. Thanks for the suggestion!
@sffc posted the following in Slack:
Okay, so my conclusion from my Rkyv investigation is that I think we could make Rkyv work, and it would give us quite compelling results in code size and performance. The downsides are:
- Data size is mostly on par with Bincode, but rather bloated when compared to Postcard
- It requires rethinking a number of the assumptions that went into the current design of the data provider
How do you suggest we proceed? Note that because of (2), I don't see a path toward a "proof of concept" of Rkyv in data provider without essentially writing the final solution.
Conclusion:
FYI, I wrote up the following example of how we could structure data structs if we went the Rkyv route.
use std::borrow::Borrow;
use std::borrow::Cow;
#[derive(serde::Serialize, serde::Deserialize)]
#[derive(rkyv::Serialize, rkyv::Archive)]
pub struct DataStruct {
message: String,
}
trait DataStructTrait {
fn message(&self) -> &str;
}
impl DataStructTrait for DataStruct {
fn message(&self) -> &str {
&self.message
}
}
impl DataStructTrait for ArchivedDataStruct {
fn message(&self) -> &str {
self.message.as_str()
}
}
fn demo(data: &(impl DataStructTrait + ?Sized)) {
println!("{}", data.message());
}
trait DataStructTrait {
fn message(&self) -> &str;
}
impl DataStructTrait for DataStruct {
fn message(&self) -> &str {
&self.message
}
}
impl DataStructTrait for ArchivedDataStruct {
fn message(&self) -> &str {
self.message.as_str()
}
}
We can also write our own proc macro that generates these accessors, if we need. This might make it easier to incrementally add rkyv without breaking all the other code.
Okay, so taking a step back. There are two main data types we want to support efficiently:
After discussing this at length with @Manishearth, here's a possible direction that satisfies our needs.
Neither Serde nor Rkyv is perfect with integer lists out of the box. Serde only supports lists of u8
, choosing not to concern itself with alignment and endianness. Rkyv uses padding to align integers properly (wasting space), and declares endianness out of scope, meaning that we would need to handle it externally, such as by shipping separate big-endian and little-endian data bundles.
Our idea (mainly @Manishearth's) to serialize a list of u32s is as follows:
UnalignedU32
, represented as [u8; 4]
under the hood, where the bytes are stored in a constant order (probably little-endian).UnalignedU32
and normal u32
.&[u32]
, cast it to a &[UnalignedU32]
and then serialize it as a byte stream. Do deserialize, cast the &[u8]
to a &[UnalignedU32]
, which can now be passed to anything that implements &[T] where T: Into<u32> + Copy
.It's unclear how we would support lists of TinyStr4 (or 8, 16) in this model. Some options:
UnalignedTinyStr4
TinyStr4
in these contexts and use [u8; N]
instead, where N
is the desired string lengthSerde does not really have any good out-of-the-box solution for zero-copy lists of strings. Rkyv uses an ArchivedVec<ArchivedString>
(I believe), which allows random-access into the vector followed by a relative pointer lookup to the start of the string.
The basic model we came up with for zero-copy lists of strings is as follows:
&[&str]
(or Vec<String>
) with ByteSliceVec<T> where T: TryFrom<&[u8]>
ByteSliceVec
as a &[u8]
where the first N
bytes are offsets, and the remaining N
bytes contain a length and a byte slice (the indices and lengths could be either 1-byte or 2-byte, supporting lists of length 256 or 65536)BytesSliceVec<T>
will fetch the appropriate &[u8]
and then pass it through TryFrom
. If desired, we could instead pre-validate the strings in the Deserialize
impl of ByteSliceVec<T>
and then use unsafe
to convert the &[u8]
to &str
during data access to speed things up.In the future, this model could be generalized to support interning/pooling of strings like ICU resource bundles are doing.
When performing zero-copy deserialization of types that are not valid for all possible bit patterns (such as chars and enums), it is necessary to validate the data before it makes it into the application. There are two opportunities to validate the data:
TryFrom<[u8]>
to verify that the entries are all valid.I lean toward (1), because it's okay to incur a little bit of extra cost during deserialization if we can win inside tight format loops.
The key insight from above is that we can introduce new types that handle alignment and endianness issues while able to be represented in memory as a &[u8]
, compatible with the Serde data model.
If we had a data struct like
#[derive(serde::Serialize, serde::Deserialize)]
pub struct DataStruct<'s> {
pub numbers: Vec<u32>,
#[serde(borrow)]
pub strings: Vec<Cow<'s, str>>,
}
this would become something like
#[derive(serde::Serialize, serde::Deserialize)]
pub struct DataStruct<'s> {
#[serde(borrow)]
pub numbers: Cow<'s, [UnalignedU32]>,
#[serde(borrow)]
pub strings: BytesSliceVec<'s, str>,
}
Advantages:
Disadvantages:
What all this means is that we would basically be building a handful of helper types for Serde to extend support for zero-copy deserialization into new types. We could put these in a new crate. Crate names could include:
serde_zero_copy_utils
serde_unaligned
I'm happy to work on these new unaligned types in parallel with the rest of the work in this thread. Now that we have an idea for a direction on how we plan to support this, I feel much more comfortable moving forward with the status quo serializers (Bincode and JSON), and then we can take our time to benchmark the improvements that these new types bring to the table.
Here are some benchmarks that we could plug into to test the new types:
Don't mean to butt in, but I'll be adding support for endian-agnostic archives as part of the next rkyv milestone. You'll still have to make sure your own types are endian-agnostic but it should make it much easier to do so.
It's unclear how we would support lists of TinyStr4 (or 8, 16) in this model. Some options:
I think we can actually design a generic type that can handle any flat type, if const generics are strong enough. Otherwise, we can basically do the same thing for u32 and TinyStr4
@djkoloski yep, another issue is that we need these types to also be patchable, and that's trickier with rkyv's entirely separate data model.
This is finally done!
https://unicode-org.github.io/icu4x-docs/doc/icu_provider_blob/index.html
When we don't want to perform file or network I/O, it may be necessary to build the data into a binary format that can be built into the Rust static memory space.
We can use a solution like CBOR or bincode.
However, these solutions require parsing the data structure into Rust objects at runtime. That should be pretty quick, and strings buffers could point into the static memory, but a pure Rust static structure would likely be the fastest. However, I have not yet done code size or performance testing.