Closed agrison closed 6 years ago
Awesome! You actually beat me to the punch here by just about ten minutes; I was literally writing exactly the same thing on my end. Our implementations are a little different, though, in that yours doesn't do any error-checking. That means it would, for example, parse xvzjkpoi-mmmm-nssss-ghty-qwrtyuiopghj
as a valid (but entirely zeroed) UUID.
My approach was to populate the non-hex-digit slots in the array with -1
, then check to see if the value we get from the array is non-negative (and throw an IllegalArgumentException
otherwise). That does mean we're still doing one more check per character (and may mean we might want to keep the getHexValueForChar
method), and so the gains aren't quite as pronounced. What do you think?
Regardless, it looks like I broke the benchmarks in 2a8b37bd2199bc10eee9f969b4d16a3e3d57d248; sorry for the trouble, and thanks for the fix!
FWIW, I fixed the benchmarks in c1a79d6abece84fb4929c4a2870422b004c47fcf so that fix doesn't get held up by discussion here.
@agrison I just pushed my approach to the same problem in #4. What do you think? (EDIT: the performance is, to my surprise, almost exactly the same even with error checking).
I think the performance costs of doing some error-checking are small enough that it makes sense to go with the approach in #4. Thank you very, very much for this contribution!
…I also cherry-picked 0ddada3 into master
.
Hey, I didn't notice you were from Boston, you're working overnight.. for me 😄
You're right for the error handling, I actually had the same implementation, using Arrays.fill(HEX_VALUE, -1)
and checking in getHexValueForChar()
if it was -1
in order to throw the exception like you did. On my machine I had something like a 15% boost by bypassing this check (avoiding a method call + condition) so I decided to push the code as is and let you decide what you really wanted. But I'm now reading that you only noticed a 5% boost so you're totally right to reactivate the error checking.
If it was for a 15% boost it would have been possible to let the user chose maybe using a boolean if he wants UUID validation also, but it seems a little YAGNI.
Regarding the String
/CharSequence
I was playing with toCharArray
+ direct access to the array instead of calling charAt()
to avoid a method call and bounds checking that occurs in it, but it led to no performance improvement and then I forgot to revert that change, good catch 😄
So now we're like 13 times faster on parsing than the UUID implementation, that's awesome !
Cheers 🏆
So now we're like 13 times faster on parsing than the UUID implementation, that's awesome !
14 times after 0409864 ;)
And, yes, I'm from Boston, but I'm actually traveling right now, so my schedule is a bit shifted (ha! bit-shifted) right now.
You can improve the parsing by 4 times by using a lookup table instead of an if/else branching to compute the hex value of a char.
Then just use
HEX_VALUE[uuidSequence.charAt(X)];
instead ofgetHexValueForChar(uuidSequence.charAt(X))
.Benchmark pom project was fixed because it pointed to a
fast-uuid-parser
which is namedfast-uuid
. I also removed some tests since thegetHexValueForChar
is now deleted. Readme has been updated in order to explain how to run the benchmarks (for those who don't know jmh).Benchmarks on my machine:
Before
After
Let's find some ways to make it even faster ;-)