Closed ploeh closed 5 years ago
I've been pointed to https://docs.microsoft.com/en-us/previous-versions/aa379358(v%3Dvs.80) which claims
typedef struct _GUID {
unsigned long Data1;
unsigned short Data2;
unsigned short Data3;
unsigned char Data4[8];
} GUID, UUID;
so your code above would only be correct when the host order is little-endian.
What is your use-case for this serialization format? is it for C FFI purposes or something else?
My use case is reading column data from SQL Server. For that, I'm using the odbc package. This package has, however, no particular representation of a UUID, so instead, for UNIQUEIDENTIFIER
columns, you just get a ByteString
. The same applies when saving data to such a column: you must supply a ByteString
value.
I've noticed that when I use toByteString
to convert a UUID
value, when I save it to the database, the bytes in first three parts are reversed.
Other people have made corroborating observations.
The explanation could be that
"The first 4 parts are either 2 or 4 bytes long and are therefore probably stored as a native type (ie. WORD and DWORD) in little endian format. The last part is 6 bytes long and it therefore handled differently (probably an array)"
and
"since the last 8 bytes are stored as a byte array, I think this identifies the behaviour you are seeing."
When I convert the bytes using the above toMixedEndianByteString
function the value gets correctly stored in the database.
@ploeh I see; however in this case I'd advocate that it should be the database library's responsibility to know how to decode/encode the types supported by the respective database; and in fact, that's what e.g. postgresql-simple does. However, I can't bring this up myself at https://github.com/fpco/odbc/issues as I've been banned by FPComplete.
I don't mind taking the issue to odbc instead. Ultimately, I can just keep my working solution in my own code base, where it already works. I did think that I'd ask here first, though, since this might be a problem with UUID
values marshalled via any Microsoft-based system.
As the Wikipedia entry suggests, this could be an issue with any UUID
you receive via COM/OLE, so it's likely to be much wider than exclusive to interacting with SQL Server. I haven't tried, but it's possible one might run into similar problems when interacting with, say, Microsoft Office, Exchange, or many other older systems of that type.
As I did spend a few hours figuring all this out, I thought I'd offer the solution at the place where it'd be most generally available to other users, thereby saving others from similarly wasted time.
If you get this encoding via OLE/COM, this means via FFI, now? In that case you'd typically not get it via a ByteString
but rather as a Ptr
and then we should rather talk about the Storable
API. I'd like to see more real-world use-cases beyond ODBC to better inform how to design and add this into the uuid package.
That's a good point; I hadn't thought that through. It's true that when interacting with the odbc package, I take advantage of the feature that already turns SQL Server's native UNIQUEIDENTIFIER
into a ByteString
. The code that does that, however, does get the data via a Ptr
.
Microsoft tends to encode UUIDs in a mixed-endian format.
There's plenty of evidence of this. Ask me how I know 😉
It'd be useful if the uuid library also provided conversions to and from this format. I created this conversion to
ByteString
:I've yet to attempt the reverse conversion, but I think it'll look similar.
Is there any interest in getting this into the library? If so, I'll be happy to attempt a pull request.