Open smklein opened 1 year ago
Interesting. What would we lose by doing this?
I think possibly the steno library's visibility into the underlying types. If steno consumed a byte-array interface, it might be more difficult for Steno itself to be responsible for logging "what data was passed", but that would give callers the ability to use a lighter-weight serialization framework.
This is admittedly all predicated on the theory that:
I do think both of those points should be validated before investing a ton of effort here, but this does align with the recommendations from https://matklad.github.io/2021/09/04/fast-rust-builds.html#Keeping-Instantiations-In-Check :
Now that we understand the pitfalls of monomorphization, a rule of thumb becomes obvious: do not put generic code at the boundaries between the crates. When designing a large system, architect it as a set of components where each of the components does something concrete and has non-generic interface.
If you do need generic interface for better type-safety and ergonomics, make sure that the interface layer is thin, and that it immediately delegates to a non-generic implementation.
I think Steno is already agnostic to the underlying types, right? I think you could replace serde_json::Value
with Bytes
and Serialize/Deserialize
with some trait that goes to/from bytes and that'd be fine, but if I'm understanding right, that wouldn't really help because what's needed is to eliminate the type parameters in the crate-level interface. Is that right? Is the idea that we'd replace action functions and other interfaces that use type parameters with more fixed types? I don't see how to do this without either requiring a lot more boilerplate in every action function that'd be easy to get wrong (to parse its input and serialize its output) or losing some compile-time safety -- or both. (But I'm not sure and that's why I asked what we'd lose.)
I think Steno is already agnostic to the underlying types, right? I think you could replace
serde_json::Value
withBytes
andSerialize/Deserialize
with some trait that goes to/from bytes and that'd be fine, but if I'm understanding right, that wouldn't really help because what's needed is to eliminate the type parameters in the crate-level interface. Is that right? Is the idea that we'd replace action functions and other interfaces that use type parameters with more fixed types?
In short: yes, either by used fixed types, or by allowing clients to define their own mechanism of serialization / deserialization. For example, if we used miniserde
instead of serde
, we could derive much less code for the DB types, and there would be no monomorphization at call-sites.
My suspicion is that this incurs a cost in the long-tail of building the omicron-nexus
binary, because the DB types are actually instantiated in the Nexus crate too -- I think this is similar to the issues seen by rust-analyzer folks:
https://github.com/awslabs/aws-lambda-rust-runtime/issues/481#issuecomment-1127228757
I also strongly agree with the follow-up opinion posted there:
I don't see how to do this without either requiring a lot more boilerplate in every action function that'd be easy to get wrong (to parse its input and serialize its output) or losing some compile-time safety -- or both. (But I'm not sure and that's why I asked what we'd lose.)
I agree that it's not as simple as "just using Serde" but I think we could define something pretty similar.
For the most part, I don't think steno cares much about the values being passed as "inputs" and "outputs", beyond the following:
serde_json::to_string
)serde_json::from_string
or serde_json::from_value
).But this doesn't actually mean that the caller needs to provide a type that can necessarily be serialized to / from JSON! It could just be a type that can be serialized / deserialized from a "string", or a sequence of bytes, or whatever.
I do think that Steno is already performing correct behavior -- I'm just trying to keep an eye out for spots where monomorphization (specifically in Nexus) could be biting us in terms of compile-time cost, this seems like it could be one of them.
One more (similar) data-point that heavy usage of serde_derive can bite consumers from a compile-time point-of-view: https://github.com/serde-rs/serde/issues/1146#issuecomment-907622649
My recollection is that the reason Steno looks this way is not because it's going to peek inside the structures you gave it, but so that Steno can take responsibility for the serialization and deserialization steps. That's in turn to eliminate boilerplate copied and pasted into every action / undo-action function, especially if that boilerplate might be easy to get wrong. It also ensures that when these steps fail, we get a uniform error, no matter which action/undo-action failed, and we can include as much context as makes sense with it. The hope was that that would make it easy to add metrics, scrape logs, etc. for these failures. So the thing I'm worried about here is just that by eliminating these type parameters, we move responsibility from one common hunk of code inside Steno to each individual action/undo action, and that people start writing this by hand and it winds up being (1) incorrect sometimes where it couldn't be today and (2) less debuggable, if people don't bother to annotate these errors with the same context that Steno does (or could). It could be we can achieve the best of both worlds -- e.g., maybe a macro similar to Dropshot's endpoint
could be used to essentially inline this code. Then it would live in one place, but reliably inserted everywhere it needs to go.
I guess it boils down to how big an effect this is. Do we have any idea? Or is the thrust of this issue "collect data to see if this is worth pursuing"?
This is the proxy I've evaluated so far:
https://gist.github.com/smklein/8fd491c4f80609473b664051c2e46c11
Based on https://github.com/dtolnay/cargo-llvm-lines#multicrate-projects
From that data, I run the following:
cat llvm-lines-lib-fat-lto.txt | awk '{$1=$1};1' | rg -v TOTAL | rg "$PATTERN" | cut -d' ' -f 1 | awk '{sum += $1} END {print "Sum: " sum}`
To get a really rough proxy for "how much is a crate contributing to total lines of LLVM".
This is still not an accurate assessment of "total compile time", but I don't really know of existing tools that perform that time-based analysis, rather than this space-based analysis.
From that script, I saw the following (and be aware, these probably overlap -- they're not necessarily distinct buckets):
diesel: 1,782,634
serde: 1,328,463
tokio: 1,158,851
dropshot: 605,179
serde_json: 391,020
nexus_db_queries: 347,291
nexus_db_model: 243,271
hyper: 139,807
schemars: 112,289
oximeter: 89,277
steno: 43,920
oso: 20,466
To be clear: Other than this communication with steno, the "db model" crates should not depend on serde -- it's not really trivial to separate "how many of the generated Serde-related lines of LLVM IR are related to the API, vs related to serializing DB structures en route to steno".
So: I think the top priority to improve compile-times should be optimizing our usage of Diesel, but I think we're generating a similar total amount of LLVM IR from our usage of serde + dropshot.
It does seem like the total amount of code generated from steno is relatively small, but it imposes the serde
dependency in the nexus_db_model
crate that otherwise doesn't need to be there. Searching for serde.*nexus_db_model
yields 86,039
lines of LLVM IR.
So while that's not nothing, it's also admittedly not huge. I suppose we'd be better off focusing more on Diesel and Dropshot for the moment.
My recollection is that the reason Steno looks this way is not because it's going to peek inside the structures you gave it, but so that Steno can take responsibility for the serialization and deserialization steps. That's in turn to eliminate boilerplate copied and pasted into every action / undo-action function, especially if that boilerplate might be easy to get wrong. It also ensures that when these steps fail, we get a uniform error, no matter which action/undo-action failed, and we can include as much context as makes sense with it. The hope was that that would make it easy to add metrics, scrape logs, etc. for these failures.
Totally, and I think that these are reasonable goals.
So the thing I'm worried about here is just that by eliminating these type parameters, we move responsibility from one common hunk of code inside Steno to each individual action/undo action, and that people start writing this by hand and it winds up being (1) incorrect sometimes where it couldn't be today and (2) less debuggable, if people don't bother to annotate these errors with the same context that Steno does (or could). It could be we can achieve the best of both worlds -- e.g., maybe a macro similar to Dropshot's
endpoint
could be used to essentially inline this code. Then it would live in one place, but reliably inserted everywhere it needs to go.
I don't think removing the dependency on Serde
necessitates callers doing manual conversions -- the structs which currently derive(Serialize, Deserialize)
could instead derive a different, lighter-weight serializers that could be used instead. https://github.com/dtolnay/miniserde and https://crates.io/crates/nanoserde are examples, that ultimately give us a trait-based implementation (so steno
could still take impl Serialize + Deserialize
as input/output, if we wanted -- we'd just feature-flag on "which Serialize/Deserialize").
https://blog.logrocket.com/rust-serialization-whats-ready-for-production-today/ has an overview of a bunch -- I think they aptly categorize serde as "elegant, flexible, fast to run, and slow to compile". But in our usage, we don't actually need flexible, because we're only really serializing to one format within steno.
I don't think removing the dependency on Serde necessitates callers doing manual conversions -- the structs which currently derive(Serialize, Deserialize) could instead derive a different, lighter-weight serializers that could be used instead.
I think I've misunderstood then. I thought the problem here was all the type parameters causing extra monomorphization. If that were the case, I think you'd have to remove them, and that would mean clients doing the serialization from their custom types to some non-parametrized type (like Bytes
or String
). If it's really just using serde here, that might be an easier change.
I also can't tell to what degree llvm lines corresponds with compile time. I wonder if a cheap next step would be to fork steno as needed and write two little steno consumers: one against the current steno and one against the new one. We could load them up with different sagas and actions (maybe a comparable number to Nexus) that just do nothing. It seems like the difference in compile times there ought to be an upper bound on what we could see from doing this in Nexus.
Here's my idea -- I'd like to validate this with additional data, and I'll see if I can sketch out some numbers for potential improvement.
A lot of the structs in
nexus/db-model
deriveserde
'sSerialize
andDeserialize
traits, in addition to their family of Diesel traits.These traits are needed in two spots:
serde_json::Value
Serialize
andDeserialize
to be implemented.This "makes sense", in the sense that Steno needs to serialize and deserialize this data. There is a minor quirk here, however -- Steno doesn't really need to parse this information, and forcing these Serialization/Deserialization derives onto all structs that could be stored / retrieved.
For example: if Steno acted on a "byte array" as input / output, we could cut down on a fair bit of monomorphized code (and therefore compile-time), and might be able to use a lighter serialization framework, like https://github.com/dtolnay/miniserde