Open jecisc opened 1 year ago
I have two possible improvements in mind:
* Rename it into UUID
I like this idea
* Merge it in Random-Core
I don't like this one so much (but mostly because I don't understand that decision). Could you explain why merging it would be good besides "having less packages"?
UUID is network specific in the way it encodes host specific information. So it has a dependency to NetNameResolver. Moving it to another package will make that package network dependent
UUIDs are certainly useful as a source of good quality random data, in many non-networking contexts.
The dependencies are concentrated in the following method:
UUIDGenerator >> computeNodeIdentifier
"Compute and return a 'node identifier', a 16 element byte array that should be unique to this image, vm, invocation, network, machine combination"
| nodeData |
nodeData := String streamContents: [ :out |
Smalltalk at: #NetNameResolver ifPresent: [ :netNameResolver |
out print: netNameResolver localHostAddress.
out print: netNameResolver localHostName].
out print: SystemVersion current.
out print: Smalltalk vm imagePath.
out print: Smalltalk commandLine options.
out print: SessionManager default currentSession hash.
out print: self hash ].
^ (MD5 hashMessage: nodeData) hash asByteArrayOfSize: 4
As you can see the NetNameResolver dependency has already been weakened. (But now that I see this again, it might not be a good idea as this method is called only on initialization of the shared generator).
It feels as if sometimes this idea of decoupling is taken too far. Maybe a better idea would be to have two versions of something like UUIDGenerator, a normal one, and one that can function in a restricted environment with less depedencies.
I agree that there are useful in non-network contexts but the usual implementation is a network specific one. The method computeNodeIdentifier with the weakening of NetNameResolver makes it almost useless because in absence of NetNameResolver the currentSession is the only one that is difference for a deployment of multiple same images. Maybe even that can be the same and it will produce the same node identifier for all images.
I was thinking, maybe the keys and values of the OS process' environment variables is a good alternative to having a signature for the current 'node':
OSPlatform current environment asDictionary.
That could simplify the code and replace the NetNameResolver access.
In code:
computeNodeIdentifier
"Compute and return a 'node identifier', a 4 element byte array that should be relatively unique to this image, vm, invocation, network, machine combination"
| nodeData |
nodeData := String streamContents: [ :out |
OSPlatform current environment keysAndValuesDo: [ :key :value |
out nextPutAll: key; nextPut: $=; nextPutAll: value; cr ].
{ SystemVersion current. Smalltalk vm imagePath. Smalltalk commandLine options.
SessionManager default currentSession hash. self hash } do: [ :each |
out print: each; cr ] ].
^ (MD5 hashMessage: nodeData) hash asByteArrayOfSize: 4
A quick test shows that starting the same image in the same directory twice yields different values.
$ ./pharo Pharo.image eval UUIDGenerator default computeNodeIdentifier
#[6 210 186 40]
Probably because at least some ENV variables are different. Would this work on Windows ?
I just tested this on windows and it yields twice the same value
Windows typically uses GUIDs (https://winaero.com/generate-new-guid-in-windows-10/) created with https://learn.microsoft.com/en-us/windows/win32/api/combaseapi/nf-combaseapi-cocreateguid
There is also one for UUIDs (https://learn.microsoft.com/en-us/windows/win32/api/rpcdce/nf-rpcdce-uuidcreate)
Linux has: https://linux.die.net/man/3/libuuid
Interesting reading related to this is https://pubs.opengroup.org/onlinepubs/9629399/apdxa.htm from OpenGroup
I just tested this on windows and it yields twice the same value
Thx. OK, but the old code probably also returned the same value twice. The shell/environment concept on windows is probably not as strong as on a Unix OS.
Windows typically uses GUIDs (https://winaero.com/generate-new-guid-in-windows-10/) created with https://learn.microsoft.com/en-us/windows/win32/api/combaseapi/nf-combaseapi-cocreateguid
There is also one for UUIDs (https://learn.microsoft.com/en-us/windows/win32/api/rpcdce/nf-rpcdce-uuidcreate)
Linux has: https://linux.die.net/man/3/libuuid
Interesting reading related to this is https://pubs.opengroup.org/onlinepubs/9629399/apdxa.htm from OpenGroup
UUIDGenerator was written to avoid using plugins, as this is now code that we can read and reason about. There were issues with the UUID plugin in the past, I can't remember the details.
We implement our own Random generator as well.
No plugins means less work for the VM guys and better portability.
I am pretty sure that we are still good here. The node identifier is just one part of a UUID, there are also a random element, a time element and a sequence element. Together they yield a UUID that is unique with high probabilty.
The current discussion is about system dependencies and a weird issue on macOS when the hostname is not set in a normal way.
WinaeroGUIDs are used to identify objects such as interfaces, ActiveX objects, virtual (shell) folders, etc. Here's how to generate a new GUID in Windows 10.
Creates a GUID, a unique 128-bit integer used for CLSIDs and interface identifiers.
The UuidCreate function creates a new UUID.
The libuuid library is used to generate unique identifiers for objects that may be accessible beyond the local system. The Linux implementation was created ...
Short Answer: We should use libuuid / Windows APi or whatever. It can be even a Plugin or a FFI call.
Long Answer: I think:
IMHO UUIDs are independent from networking concepts; however UUIDv1 is certainly network dependent.
Maybe the solution would be to add support to UUIDv4 and v5, and then use v4 by default instead of v1.
UUID is general and not Network specific. Thus I find the name missleading. Seen that name, I would wonder "Should I use it for my project that has nothing to do with network?"
Even in Pharo we use it for other things. For example the SessionManager uses it to generate its ID.
I have two possible improvements in mind:
To select the best one I'll need to check the dependencies it brings and see if it would make sense or not for Random-Core to have those dependencies.