Closed skef closed 4 months ago
I think I must be using a different bikeshed, as my diffs are a lot larger. I did a fresh install before running make
.
I think I must be using a different bikeshed, as my diffs are a lot larger. I did a fresh install before running
make
.
Mine might be out of date, I'll try updating.
I think I must be using a different bikeshed, as my diffs are a lot larger. I did a fresh install before running
make
.Mine might be out of date, I'll try updating.
Ok yeah it was, just made a PR to update Overview.html to be built with the latest version: https://github.com/w3c/IFT/pull/170
@garretrieger It seems like another option that would address the (non-editorial) suggestions here would be to include the base32hex padding after all and specify that the ID characters be relative to the last non-padding character. (And then also add the point about UTF-8). What do you think?
@garretrieger It seems like another option that would address the (non-editorial) suggestions here would be to include the base32hex padding after all and specify that the ID characters be relative to the last non-padding character. (And then also add the point about UTF-8). What do you think?
The compactness of base64 is very desirable when encoding opaque binary data, especially given that there are URL length limits. Conversely in the numeric id case the padding would add several extra bytes to most of the URLs since base32 pads up to the 40 bit boundary. Ultimately we have two different use cases (file based w/ numeric ids vs dynamic with string ids) with different requirements. Each of which is best served by a different encoding so I think it makes the most sense to just have both available. We can add an explicit warning to avoid using the base64 encoding when then patches will be stored in a filesystem if that's a concern.
I'll update this once #171 is sorted
I've updated this, but also made it into a draft for the time being.
Issues:
The endian-ness of Hexidecimal is a settled issue -- bytes are encoded as big-endian. But I'm not sure it's as clear with base32hex, so maybe we we should make that explicit when talking about the integer id encodings.
The endian-ness of Hexidecimal is a settled issue -- bytes are encoded as big-endian. But I'm not sure it's as clear with base32hex, so maybe we we should make that explicit when talking about the integer id encodings.
Sounds good, reading through the base32 rfc it converts an input which is a list of bytes, so we will need to explain how a numeric ID is converted to a list of bytes. Something like: Convert the numeric ID to a big endian 64 byte unsigned integer encoding. Remove all leading bytes that are equal to 0.
I've updated this, but also made it into a draft for the time being.
Issues:
- What we're calling base64url, at least in the linked RFC, still uses "=" as padding, and underscore is already taken as an alternative, so we'll have to think about that.
"=" will be converted to % encoding during template substitution so default base64url will work, just use more characters than strictly needed to represent padding. I'm leaning towards just using it as is and letting encoders work around it if they want (for example by making sure the ID strings are always in multiples of 24 bits). An alternative would be to define one of the remaining unreserved characters ("." or "~") as the padding character.
- I'm not sure the distinction between converting to UTF-8 for base32hex and encoding the raw string for base64url is right. (Not sure it's wrong either, we should talk about it.)
I suggest we always use the raw ID string bytes for both base32hex and base64url, and remove the text later on in the specification which says ID strings are UTF8 encoded.
- Maybe if we're adding base64 we should just stick to hex for IDs after all, to remove the padding issues in those cases? I can see arguments both ways.
- Someone should check my work in the examples.
Sounds good, reading through the base32 rfc it converts an input which is a list of bytes, so we will need to explain how a numeric ID is converted to a list of bytes. Something like: Convert the numeric ID to a big endian 64 byte unsigned integer encoding. Remove all leading bytes that are equal to 0.
In that case, maybe we should have the rule for ID conversion in the URL match the rule for storage in the patches (or, at least, what that was in IFTB, I haven't checked if it's changed): If the highest ID is < 256 convert the byte, if it's > 256 convert to a network order u16 and encode that.
Sounds good, reading through the base32 rfc it converts an input which is a list of bytes, so we will need to explain how a numeric ID is converted to a list of bytes. Something like: Convert the numeric ID to a big endian 64 byte unsigned integer encoding. Remove all leading bytes that are equal to 0.
In that case, maybe we should have the rule for ID conversion in the URL match the rule for storage in the patches (or, at least, what that was in IFTB, I haven't checked if it's changed): If the highest ID is < 256 convert the byte, if it's > 256 convert to a network order u16 and encode that.
Format 2 doesn't have a specific byte width since it relies on deltas so I think probably best to just use a variable number of bytes. The current text you have "When the id is an unsigned integer only the significant bytes are encoded. (For example, when the integer is less than 256 only one byte is encoded.)" plus a mention that they are big endian ordered should be sufficient.
Sounds good, reading through the base32 rfc it converts an input which is a list of bytes, so we will need to explain how a numeric ID is converted to a list of bytes. Something like: Convert the numeric ID to a big endian 64 byte unsigned integer encoding. Remove all leading bytes that are equal to 0.
In that case, maybe we should have the rule for ID conversion in the URL match the rule for storage in the patches (or, at least, what that was in IFTB, I haven't checked if it's changed): If the highest ID is < 256 convert the byte, if it's > 256 convert to a network order u16 and encode that.
Format 2 doesn't have a specific byte width since it relies on deltas so I think probably best to just use a variable number of bytes. The current text you have "When the id is an unsigned integer only the significant bytes are encoded. (For example, when the integer is less than 256 only one byte is encoded.)" plus a mention that they are big endian ordered should be sufficient.
Would probably be good to add an example of a number that requires more than one byte in the example table.
I think the last commit is up to date with the feedback.
I think the last commit is up to date with the feedback.
Looks good, I just added a commit to remove references to ID strings being UTF-8 encoded in the format 2 section.
As discussed in #167
Not sure the advice about bytes to encode is sufficient or clear enough. That's supposed to match the advice on omitting leading hex zeros, but the issue is more complicated with base32hex because you wind up with some leading zeros anyway.
Preview | Diff