When parsing a 38-char UUID, require '{' and '}' characters

google / uuid

Go package for UUIDs based on RFC 4122 and DCE 1.1: Authentication and Security Services.

BSD 3-Clause "New" or "Revised" License

5.26k stars 362 forks source link

When parsing a 38-char UUID, require '{' and '}' characters #61

Open Carrotman42 opened 4 years ago

Carrotman42 commented 4 years ago

It was not intentional to simply ignore these characters, and examples/docs/tests all indicate that curly braces were the intended characters. I've added some test cases to ensure things work.

Fixes https://github.com/google/uuid/issues/60

Carrotman42 commented 4 years ago

Weirdly I can't seem to add reviewers/assignees. Not sure why, maybe an ACL thing. @pborman ptal

pborman commented 4 years ago

Thanks for the fix.

This is the right thing to do but is potentially a breaking change. It used to handle a UUID in quotes but will no longer. It wasn't intended for it to be able to do that but I can easily imagine someone is depending on that. Perhaps you can change the code to see if the first and last byte are the same and that the first byte is either a '{' or '" (double quote). Does that sound reasonable to you?

pborman commented 4 years ago

Wait, that only works for quotes. You do have to check the first and last independently because last I checked '{' != '}' :-)

Carrotman42 commented 4 years ago

I mean we're kinda stuck between a rock and a hard place. Today you can think of " needing to work, tomorrow someone comes along and says they need ' or [ to work. At this point it's a can of worms and we need to decide whether it's better to actually do what the docs have claimed for years, or to maintain a special-case list (both in code and docs).

Carrotman42 commented 4 years ago

If it wasn't clear: it's your call, I'm happy to implement a special case list.

pborman commented 4 years ago

I think a double quote is probably the most common I was also thinking of an array for fast checks:

var quotes = [256]byte { '{': '}', '"': '"', }

Then it is really easy to add other pairs, does not complicate the code, and does not increase run time. I agree, there are probably other cases as well. What comes to mind is < [ ( { ' ". Probably the least likely to break anything. I suppose ` could be in there too. I think any other character might be harder to justify.

ylz-at commented 3 years ago

Any update?

pborman commented 3 years ago

I left this with the comment that this will break the parsing of UUIDs in quotes, which I think is a reasonable possibility. Any solution will slow down the parsing of 38 byte UUIDs, though I am more concerned with breaking existing code.

I coded up a more flexible solution:

var match = [256]byte {
        '{': '}',
        '[': ']',
        '(': ')',
        '<': '>',
        '"': '"',
        '\'': '\'',
        '`': '`',
}

                if m := match[s[0]]; m == 0 || m != s[37] {
                        return uuid, fmt.Errorf("invalid prefix/suffix %c %c", s[0], s[37])
                }

This adds three loads (one being a table lookup) and two comparisons to every 38 byte UUID.

beiriannydd commented 3 years ago

I think this is probably a slippery slope, since the unicode quotes might legitimately be requested to be added and all of a sudden you're an array of [2^32]rune. Better to stick to what is documented to work which means folk will have to account for anything outside that when they recompile against the new library. Bump the version as it does change the behaviour. Right now the behaviour is buggy and addressing the bug is expected to break non-compliant code which has relied on the bug, no? Note the reason for the new version and you have consistent (buggy) behaviour in the same major version and fixed behaviour in the new version. Or document the current behaviour as correct (which seems relatively benign) and then everyone will at least expect the behaviour.

pborman commented 3 years ago

@beiriannydd I think you are right (though unicode won't matter because those will be more than 1 byte long). Probably the better answer for those that want to use this function to validate is to create a new function that validates and you can pass in the starting and ending string. It would strip the delimiters and then expect 36 bytes left over.