pharo-project / pharo

Pharo is a dynamic reflective pure object-oriented language supporting live programming inspired by Smalltalk.
http://pharo.org
Other
1.19k stars 353 forks source link

Network-UUID is a bad name #13452

Open jecisc opened 1 year ago

jecisc commented 1 year ago

UUID is general and not Network specific. Thus I find the name missleading. Seen that name, I would wonder "Should I use it for my project that has nothing to do with network?"

Even in Pharo we use it for other things. For example the SessionManager uses it to generate its ID.

I have two possible improvements in mind:

To select the best one I'll need to check the dependencies it brings and see if it would make sense or not for Random-Core to have those dependencies.

guillep commented 1 year ago

I have two possible improvements in mind:

* Rename it into UUID

I like this idea

* Merge it in Random-Core

I don't like this one so much (but mostly because I don't understand that decision). Could you explain why merging it would be good besides "having less packages"?

noha commented 1 year ago

UUID is network specific in the way it encodes host specific information. So it has a dependency to NetNameResolver. Moving it to another package will make that package network dependent

svenvc commented 1 year ago

UUIDs are certainly useful as a source of good quality random data, in many non-networking contexts.

The dependencies are concentrated in the following method:

UUIDGenerator >> computeNodeIdentifier
    "Compute and return a 'node identifier', a 16 element byte array that should be unique to this image, vm, invocation, network, machine combination"

    | nodeData |
    nodeData := String streamContents: [ :out |
        Smalltalk at: #NetNameResolver ifPresent: [ :netNameResolver |
            out print: netNameResolver localHostAddress.
            out print: netNameResolver localHostName].
        out print: SystemVersion current.
        out print: Smalltalk vm imagePath.
        out print: Smalltalk commandLine options.
        out print: SessionManager default currentSession hash.
        out print: self hash ].
    ^ (MD5 hashMessage: nodeData) hash asByteArrayOfSize: 4

As you can see the NetNameResolver dependency has already been weakened. (But now that I see this again, it might not be a good idea as this method is called only on initialization of the shared generator).

It feels as if sometimes this idea of decoupling is taken too far. Maybe a better idea would be to have two versions of something like UUIDGenerator, a normal one, and one that can function in a restricted environment with less depedencies.

noha commented 1 year ago

I agree that there are useful in non-network contexts but the usual implementation is a network specific one. The method computeNodeIdentifier with the weakening of NetNameResolver makes it almost useless because in absence of NetNameResolver the currentSession is the only one that is difference for a deployment of multiple same images. Maybe even that can be the same and it will produce the same node identifier for all images.

svenvc commented 1 year ago

BTW,

https://github.com/pharo-project/pharo/issues/13420

shows a serious problem.

svenvc commented 1 year ago

I was thinking, maybe the keys and values of the OS process' environment variables is a good alternative to having a signature for the current 'node':

OSPlatform current environment asDictionary.

That could simplify the code and replace the NetNameResolver access.

svenvc commented 1 year ago

In code:

computeNodeIdentifier
    "Compute and return a 'node identifier', a 4 element byte array that should be relatively unique to this image, vm, invocation, network, machine combination"

    | nodeData |
    nodeData := String streamContents: [ :out |
        OSPlatform current environment keysAndValuesDo: [ :key :value |
          out nextPutAll: key; nextPut: $=; nextPutAll: value; cr ].
        { SystemVersion current. Smalltalk vm imagePath. Smalltalk commandLine options.
          SessionManager default currentSession hash. self hash } do: [ :each |
          out print: each; cr ] ].
    ^ (MD5 hashMessage: nodeData) hash asByteArrayOfSize: 4

A quick test shows that starting the same image in the same directory twice yields different values.

$ ./pharo Pharo.image eval UUIDGenerator default computeNodeIdentifier
#[6 210 186 40]

Probably because at least some ENV variables are different. Would this work on Windows ?

jecisc commented 1 year ago

I just tested this on windows and it yields twice the same value

astares commented 1 year ago

Windows typically uses GUIDs (https://winaero.com/generate-new-guid-in-windows-10/) created with https://learn.microsoft.com/en-us/windows/win32/api/combaseapi/nf-combaseapi-cocreateguid

There is also one for UUIDs (https://learn.microsoft.com/en-us/windows/win32/api/rpcdce/nf-rpcdce-uuidcreate)

Linux has: https://linux.die.net/man/3/libuuid

Interesting reading related to this is https://pubs.opengroup.org/onlinepubs/9629399/apdxa.htm from OpenGroup

svenvc commented 1 year ago

I just tested this on windows and it yields twice the same value

Thx. OK, but the old code probably also returned the same value twice. The shell/environment concept on windows is probably not as strong as on a Unix OS.

svenvc commented 1 year ago

Windows typically uses GUIDs (https://winaero.com/generate-new-guid-in-windows-10/) created with https://learn.microsoft.com/en-us/windows/win32/api/combaseapi/nf-combaseapi-cocreateguid

There is also one for UUIDs (https://learn.microsoft.com/en-us/windows/win32/api/rpcdce/nf-rpcdce-uuidcreate)

Linux has: https://linux.die.net/man/3/libuuid

Interesting reading related to this is https://pubs.opengroup.org/onlinepubs/9629399/apdxa.htm from OpenGroup

UUIDGenerator was written to avoid using plugins, as this is now code that we can read and reason about. There were issues with the UUID plugin in the past, I can't remember the details.

We implement our own Random generator as well.

No plugins means less work for the VM guys and better portability.

I am pretty sure that we are still good here. The node identifier is just one part of a UUID, there are also a random element, a time element and a sequence element. Together they yield a UUID that is unique with high probabilty.

The current discussion is about system dependencies and a weird issue on macOS when the hostname is not set in a normal way.

Winaero
Generate a GUID in Windows 10 (Globally Unique Identifier)
GUIDs are used to identify objects such as interfaces, ActiveX objects, virtual (shell) folders, etc. Here's how to generate a new GUID in Windows 10.
CoCreateGuid function (combaseapi.h) - Win32 apps
Creates a GUID, a unique 128-bit integer used for CLSIDs and interface identifiers.
UuidCreate function (rpcdce.h) - Win32 apps
The UuidCreate function creates a new UUID.
libuuid(3) - Linux man page
The libuuid library is used to generate unique identifiers for objects that may be accessible beyond the local system. The Linux implementation was created ...
DCE 1.1: Remote Procedure Call - Universal Unique Identifier
tesonep commented 1 year ago

Short Answer: We should use libuuid / Windows APi or whatever. It can be even a Plugin or a FFI call.

Long Answer: I think:

fcr-- commented 10 months ago

IMHO UUIDs are independent from networking concepts; however UUIDv1 is certainly network dependent.

Maybe the solution would be to add support to UUIDv4 and v5, and then use v4 by default instead of v1.