Closed XAMPPRocky closed 10 months ago
Looks like I've got some reading to do :eyes: These are basically ULIDs are they?
I hadn't seen ULID before, but looking at the project it seems very similar, UUID has more variety in what kind of timestamp you can use.
Today I ran across a project that collects all UUIDv6 prototypes. It lacks Rust implementation :) If you add UUIDv6 support to your library, it would be great
https://github.com/uuid6/prototypes/
GitHubDraft Prototypes and Tests for UUIDv6 and beyond. Contribute to uuid6/prototypes development by creating an account on GitHub.
I hadn't seen ULID before, but looking at the project it seems very similar, UUID has more variety in what kind of timestamp you can use.
Looks like I've got some reading to do 👀 These are basically ULIDs are they?
As I understand, UUIDv6 can be sorted and the sorted list would be in the generated time order that make UUIDv6 much more desirable for using as primary key or alike in databases
Thanks @XAMPPRocky and @VladimirMarkelov. I'll spend some time unpicking the spec to see if it would make sense to share our v1
timestamp APIs with v6
or whether there are changes we should make to unify them.
Ok, based on a really quick reading it looks like our Timestamp
and ClockSequence
should already be suitable at least for v6
and v7
UUIDs. It already can convert between the original Gregorian format and a standard Unix epoch. I haven't looked at v8
just yet.
Do we have any experience so far from similar implementations on what a good ratio of subsecond precision / clock sequence / random data we should target for v7
UUIDs? We already have access to 32bits (possibly low precision though) of subsecond data and 16bits for a clock sequence. If we used all that I think it would leave us with 38bits for random data, which seems a little low.
This will be great to have!
Do we have any experience so far from similar implementations on what a good ratio of subsecond precision / clock sequence / random data we should target for v7 UUIDs?
Would it be possible to make it user-configurable?
Hey @coreh :wave: Are you currently looking at using these new UUID variants? If so do you have any constraints or experiences to share?
We could look at making this configuration for sure, but whether or not we do that unconditionally (in something like Uuid::new_v7
) or as an advanced API (in the Builder
) would depend on how important they are to tweak. It might also be possible to set a precision on the clock sequence too so it tells the UUID how many bits it wants.
Yeah, I'm currently looking at using them for a hobby project (very early, in concept stage). My use case is to have a bunch of (untrusted and sandboxed) WASM applications running in a mesh network of hosts. Resources are identified by UUIDs, "owned" by the different applications, and are shared and accessible (via message passing) throughout the entire network. Applications can also be "migrated" between hosts.
My idea is to use the the highest amount of bits possible from the subsec_seq_node
portion to uniquely identify applications for message routing (so that nodes can query other nodes for who's running that application) and to avoid giving the ability for applications to "bruteforce" or "scan" for the presence of other applications in the mesh network. I suspect most apps will create very few resources (say < 100,000) so this arrangement will probably work fine. However, I also wanted to keep the door open for some resource-heavy apps (that might need to create millions or billions of resources, quickly) to request a different allocation scheme with a shorter node
portion in subsec_seq_node
and more time/pseudorandom bits.
I also suspect (but I'm not entirely sure) the big endian, lexicographical sorting aspect of UUIDv7 will help with compression over the wire, especially when serializing/deserializing resource state when applications are migrated between nodes.
Ah I see 🤔 I was hoping we could use the same clock infrastructure we have for v1
UUIDs for this too (we already expose APIs for working with Unix epoch dates). Maybe they could determine the precision somehow so we can fill the rest with random data.
I saw the RFC got pushed back to next year. I don’t think I’ve ever looked at an IETF RFC that was in draft before so am not sure how normal that is 🙂
I saw the RFC got pushed back to next year. I don’t think I’ve ever looked at an IETF RFC that was in draft before so am not sure how normal that is
FWIW I've not been involved with IETF, but just from my experience implementing the RFCs, there's plenty of tech in production that lives in the draft or proposed RFC space for a long time, it more represents how much change there could be. For example; HTTP/2 (RFC 7540) and QUIC (RFC 9000) are still considered proposed standards by the IETF despite everyone using them.
The new UUID formats are being discussed in this github repository: https://github.com/uuid6/uuid6-ietf-draft
GitHubUUID version 6 IETF draft. Contribute to uuid6/uuid6-ietf-draft development by creating an account on GitHub.
HTTP/2 (RFC 7540) and QUIC (RFC 9000) are still considered proposed standards by the IETF despite everyone using them.
Wow! Ok there you go.
The new UUID formats are being discussed in this github repository
Ah thanks @fabiolimace! :eyes:
I'll sketch this out on a branch and we can decide whether or not we want to ship them under a v6-draft
/ v7-draft
/ v8-draft
feature so people can use them, with the caveat that they may change based on the draft spec.
So we currently have access to 96bits of timestamp (64bits of timestamp and 32bits of nanos) Timestamp::from_unix
. I'm thinking if it's possible we should flip the internal representation of Timestamp
to keep all that information (so it'll become 16bits bigger), and then introduce a TimestampPrecision
type that's added to ClockSequence
and can be used to truncate a timestamp.
Maybe something like:
pub struct Precision(TimestampPrecision, SequencePrecision);
pub struct TimestampPrecision(u8);
impl TimestampPrecision {
pub const SECONDS: Self = Self(0);
pub const MILLIS: Self = Self(3);
..
pub fn from_places(places: u8) -> Self { .. }
}
pub struct SequencePrecision(u8);
impl SequencePrecision {
pub const V1: Self = Self(14);
pub fn from_bits(bits: u8) -> Self { .. }
}
pub trait ClockSequence {
fn precision(&self) -> Precision { Precision(TimestampPrecision::MILLIS, SequencePrecision::V1) }
fn generate_sequence(&self, seconds: u64, subsec_nanos: u32) -> u16;
}
impl Uuid {
// Without precision we'll assume 96(?)bits of timestamp and node are for the timestamp
// This can produce some jittery results beyond the actual precision of the timestamp stored
fn to_timestamp_with_precision(&self, precision: Precision) -> Timestamp { .. }
}
impl Uuid {
// Any truncated portion of the timestamp to the end of the UUID will be filled with random data
// The precision used to generate the timestamp will determine when it truncates
pub fn new_v7(ts: Timestamp) -> Result<Self, crate::Error> { .. }
}
The idea being we can do this without any breakage to v1
but without also introducing a lot of new very similar infrastructure for v7
and v8
.
What do you all think?
I see the reference Python implementation has these defaults for v7
:
sec_bits = 36 # unixts at second precision
subsec_bits = 30 # Enough to represent NS
sequence_bits = 8 # Enough for 256 UUIDs per NS
We could probably use the same.
We're working towards a 1.0
release of uuid
now. I'm confident we'll be able to support v6-v8 UUIDs without any disruptive breakage so I don't think this needs to block that.
I think the sketch I made above is a bit unnecessary. All we really need is a way to retain all the precision in Timestamp
for a Unix timestamp, which we can do.
Is anyone working on this ? I could make a PR for v7
.
Hi @malobre :wave: There isn't any active work on this right now. I haven't been following the RFC too closely, but was under the impression that things were changing in it so figured I'd check back in with it later.
If you'd like to work on some v7 support we can work out how to ship it, either in uuid
as an unstable API, or externally, so we can keep up with the RFC as it evolves if necessary.
I have forked this repo and started to play with things on the v7 branch.
I will be putting any public facing additions behind the v7
feature and uuid_unstable
cfg.
Working with v1::Timestamp
is a little awkward for UUIDv7, i.e: we want milliseconds since unix epoch, but the public api (e.g: Timestamp::from_unix
taking seconds & nanos instead of ms) and internal representation are tailored for UUIDv1 (number of 100-nanosecond intervals since gregorian epoch).
It makes sense to share the struct between v1 & v6, but for v7 not so much.
Should I define a separate v7::Timestamp
struct or should I try to make it work with the current v1::Timestamp
?
Super excited to have uuid v7 support. @malobre does your fork currently work?
@malobre does your fork currently work?
No, I only made some groundwork as I intended to clarify some things beforehand.
In the meantime, here's a snippet for single-node, non-batched, UUIDv7 generation:
let uuid = {
use std::time::{SystemTime, UNIX_EPOCH};
let mut buf = rand::random::<u128>() & 0xFFF3FFFFFFFFFFFFFFF;
// 48 bits unix timestamp in ms
buf |= SystemTime::now()
.duration_since(UNIX_EPOCH)
.expect("SystemTime before UNIX Epoch")
.as_millis() << 80;
// version
buf |= 0x7 << 76;
// variant
buf |= 0b10 << 62;
uuid::Uuid::from_u128(buf)
};
Edit: I thought I should really emphasize that you should NOT use the above snippet if you intend to generate UUIDs in a distributed fashion or in batches.
FYI, I stumbled upon a rust implementation: https://github.com/LiosK/uuid7-rs
I believe the main challenge will be integrating UUIDv7 with the existing Timestamp
struct rather than the implementation itself.
The Timestamp
type is pretty opaque, so I think we should be able to come up with a scheme that can support the various shaped timestamps in the new formats. If need be we could also come up with an alternative type and provide conversions between them.
FYI, I am not currently able to spend much time on this (or open-source in general). If someone want to take a stab at this, please do :-)
@KodrAus - Regarding your sketch for *Precision
, it seems to me that the only Uuid type that needs to support an arbitrary precision for time and sequence is v8, and I have doubts about that.
It seems to me that introducing Time and Sequence Precisions introduce a fair amount of complexity for only 2 choices for precision. V6 is already implemented, and V7 is always fixed at milliseconds, so there isn't much of a reason to supply a variable-precision timestamp. In addition, I don't think we can manage construction generically for the different uuid types, consdering that v1 and v6 use the funky ticks offset.
For v8, to be spec-compliant, we don't actually need to provide any functionality for v8 other than accept a 16 byte buffer that we distribute into Bytes at the correct offsets.
I started implementing V8 helper types (e.g. construct a new v8 Uuid from now()
) but, frankly, there are an awful lot of variables that would be hard to account for:
I guess my point is that V8 is left blank for a reason, and if we took a guess at what people wanted, we'd probably be overlapping quite a bit with the v7 use case.
from_buffer([u8; 16])
and maybe from_fields(u64, u16, u64)
and leave the rest up to the user.If you're cool with this, I'm happy to implement it.
If we want to add a really fancy UuidV8Builder which can take all of the above into consideration (e.g.
let foo = Builder::new()
.set_random_bits()
.set_seconds_bits()
.set_frac_bits()
.set_frac_divisor()
.set_hostname()
.ignore_unused_bits()
.build();
Oh, also, the v7 spec sort of punts on the counter/sequent and how to do monotonicity. Its says that 1000 Uuids created in a batch should be sorted in order. It doesn't say how, but it does offer suggestions.
I think the best way would be to apply a 10 bit (or so) counter to most significant bits of the random number portion of the random bits. So instead of 74 bits of randomness, it'd be 64. (or to fit more neatly into the bit segments.. 12 and 62)
After implementing the changes I mentioned above. The API will change because I've separated Counter from Timestamp (which I think is the correct approach). A Timestamp shouldn't know anything about a Counter, but a counter could/should know about a Timestamp.
Also there are some legacy things in here, such as being edition 2018 instead of 2021. Also, it uses the atomic crate but the reason for it is no longer valid, since std::sync::atomic now has those types that we need.
Also, std::time is standardized, so we can add helper functions based on SystemTime (not that this would be a breaking change, we would just add new methods)
IMO, we should bump the version to 2.0 and modernize things a bit more.
Thoughts?
Another fun side-effect of the added versions/features:
the cargo hack powerset tests fail by running the device out of space. Too many tests now with the 3 added version features.
Thanks for all the work you put in to this @rrichardson! :bow: I'm glad the RFC moved away from the original variable-precision timestamp they had for v7 to a much simpler milliseconds-since-unix-epoch. I've papered over some of the issues with Timestamp
and ClockSequence
for these new formats by adding short-hands for generating UUIDs from the current system time. So instead of having to call Uuid::new_v7(Timestamp::now(NullSequence))
, you can call Uuid::now_v7()
. The same goes for version 1 and 6 UUIDs.
Before we release this, I would like to gate it behind our RUSTFLAGS
cfg for unstable things for the time being, just to be sure the formats are pretty well fixed now. As I understand it, the scope of the draft has now been extended to include rewriting RFC4122, so I don't expect things to change much, but don't know enough about that process to know when things reach their steady-state.
The first release on crates.io that supports v6-v8 UUIDs is published as 1.2.0
. Until we’re confident the spec is fixed enough that our APIs won’t need to change you need to set the --cfg uuid_unstable
RUSTFLAGS as well as their feature gates to enable them. Once we are confident the RUSTFLAGS requirement will go away. I’m expecting maybe the first draft of the new RFC with its broader scope to rewrite all of RFC4122 (I think that has a deadline around March of 2023?) could be that point if things don’t change much in the meantime.
I'll update the OP to include a link, but it looks like the working group responsible for the new UUID RFC is working here
cc @KodrAus
--cfg uuid_unstable
is very much unusable for workspaces. Workspaces ignore config files from each project therefore invalidating configs like:
[build]
rustflags = ["--cfg uuid_unstable"]
Isn't there an alternative way to allow using those features other than manually setting this?
@ConsoleTVs You can put rustflags
in a top-level .cargo/config.toml
to have them respected by all crates in your workspace:
[build]
rustflags = ["--cfg", "uuid_unstable"]
As soon as we're confident that these new versions have a stable standard we'll be able to remove that required --cfg
.
Ah, had tried it before but didn't work. What's the difference between those two?
rustflags = ["--cfg uuid_unstable"]
rustflags = ["--cfg", "uuid_unstable"]
I think —cfg uuid_unstable
will be passed to rustc as a single commandline argument, but what it really wants is two arguments: the first being —cfg
, the second being the config itself. That might have been why it wasn’t working before 🙂
Indeed the first one crashes. Interesting. Your solution works, thank you!
Would anybody happen to know how we can the status of the current IETF RFC? From memory it had some deadline around March, but I'm not familiar with how their processes work. Would asking on their mailing list be an appropriate thing to do?
@KodrAus between the issue tracker https://github.com/ietf-wg-uuidrev/rfc4122bis and the mailing list https://mailarchive.ietf.org/arch/browse/uuidrev/?q=draft-ietf-uuidrev-rfc4122bis it seems like they're just nitpicking the formatting and such. Maybe you could ping @kyzer-davis who seems to be heading it up.
Since they're currently focusing on serious issues like this
I'd have to imagine that the implementation details aren't going to change any further. Are you considering removing the uuid_unstable
gate?
@tgross35 As soon as the standard is fixed I'd love to get rid of the uuid_unstable
gate. It seems unlikely that it will at this point, but while there's still technically some chance I think we should play it safe.
@KodrAus today the RFC maintainer posted some updates for the new recently released draft, I'm not sure if there's anything relevant there: https://github.com/uuid6/prototypes/issues/42
Do you have any idea which version of the draft this crate supports? I'll submit a PR to the readme for that repo to add this to the implementations table, unless you prefer to DIY https://github.com/uuid6/prototypes
@tgross35 Thanks for checking in to this. I think we based our implementation off the final version of the draft before its scope widened to rewriting RFC 4122 🤔 We should keep up-to-date with whatever the current version is though.
If you'd like to submit a PR that adds uuid
to the table that would be great!
Thanks @tgross35 for following up on things! 🙇 This is where things are at right now. It seems like we're getting close to having these new formats stabilized in more or less their current form.
Any progress on the stabilization of UUID v6-v8?
@photino i think there will be a few more months before the RFC becomes officially accepted. Close, but not there yet - status at https://mailarchive.ietf.org/arch/browse/uuidrev/?q=draft-ietf-uuidrev-rfc4122bis
The RFC status has become Proposed Standard
. Could we consider stabilizing the v6-v8 features?
@photino I think proposed standard is the maturity level we were blocking on, so once #717 is resolved I think we can finally remove the stability gate.
I think the test vectors (appendix A) changed for one of the new versions at some point. I'm not sure if related tests exist in this repo, but maybe they are worth pulling in?
I'm not sure where benchmarks are used, but one for v7 would be interesting to see how it compares to v4
Is your feature request related to a problem? Please describe. There is a new draft RFC for UUID formats that are more database friendly, it would be nice if these were supported in the
uuid
crate, since the new formats share a lot of the same internals.Describe the solution you'd like An implementation of UUIDv6, v7, and v8 from the IETF RFC.
Is it blocking? No, it would just be cool to have.
EDIT (@KodrAus): The RFC is being worked on here