allow uuid_str() to take any string or blob

nalgeon / sqlean

The ultimate set of SQLite extensions

MIT License

3.76k stars 120 forks source link

allow uuid_str() to take any string or blob #99

Closed terefang closed 1 year ago

terefang commented 1 year ago

today this happens:

sqlean> select uuid_str(md5('x'));
9dd4e461-268c-8034-f5c8-564e155c67a6
sqlean> select uuid_str(sha1('x'));

sqlean> select uuid_str(sha3('x'));

sqlean> select uuid_str(sha256('x'));

sqlean> select uuid_str(sha512('x'));

sqlean>

if the string or blob is at least 16 bytes long uuid_str() could just take the first 16 bytes and ignore the rest.

in addition a shortcut could be dedicated functions like:

uuid_str_md5(data)
uuid_str_sha1(data)
uuid_str_sha3(data)
uuid_str_sha256(data)
uuid_str_sha512(data)

nalgeon commented 1 year ago

uuid_str only works with valid UUIDs. And why would you want to create a UUID from the first 16 bytes of the SHA-256 hash? What's the use case here?

terefang commented 1 year ago

like me reference UUIDv3, v5 and v8:

A UUID is generated based on an unspecified name. Names are unique identifiers for an object, resource or similar within an assigned namespace. Starting from a UUID for the namespace, a UUID is generated from the name by forming a byte sequence from the namespace UUID and the name itself and then hashing this byte sequence using MD5 or SHA1. The hash is then distributed among the available UUID bits in a defined manner.

UUIDv3 are created from the output of a MD5 hash
UUIDv5 are created from the output of a SHA1 hash
UUIDv8 are created from arbitrary byte sequences.

one could call the function more precisely:

uuidv3_str_md5(data)
uuidv5_str_sha1(data)
uuidv8_str_sha3(data)
uuidv8_str_sha256(data)
uuidv8_str_sha512(data)

terefang commented 1 year ago

my particular real-world use case is that i create "stable unique ids" out of the concatenation of various text-fields in the row, which are then easier to join and reference and can also act as safe ids in other protocols (like rest-urls).

today i have to do this outside of sqlite with scripting re-writing the csv-import.

it would be much simpler to do the csv-import and then issue UPDATE table SET XID=uuidv8_str_sha512(f1 || f2 || f3).

terefang commented 1 year ago

i have looked at the code ... a quick win would be to just check if the parameter is "at least" and not "exactly" 16 bytes.

@nalgeon would you agree ?

nalgeon commented 1 year ago

Sorry, I don't like the idea of uuid_str truncating its argument. You can always use substr(x, 1, 16) on the hashcode and then call uuid_str on the result.

terefang commented 1 year ago

would you accept a pull request for uuid_str_<HASHALGO> ?

nalgeon commented 1 year ago

No, I don't think so, sorry.

terefang commented 12 months ago

i dont understand

i have presented a use-case
i have given standard reference

i was only interested of also making other users lifes easer.

sqlean is a really useful contribution to the sqlite community and i hope you keep up the existing work.