pharo-project / pharo

Pharo is a dynamic reflective pure object-oriented language supporting live programming inspired by Smalltalk.
http://pharo.org
Other
1.21k stars 356 forks source link

Symbol>>#intern: of WideString: do we need the #isOctetString check? #11306

Open MarcusDenker opened 2 years ago

MarcusDenker commented 2 years ago

Describe the problem Symbol>>#intern: for WideString check if the WideString could be represented as a ByteString (using #isOctetString).

This is not nice as it means iterating any WideString for the check, slowing down #asSymbol.

Do we need to do that check?

I copy over the discussion from https://github.com/pharo-project/pharo/pull/11291

MarcusDenker commented 2 years ago

@svenvc:

I looked at the Symbol interning code before, I found it confusing and a bit of a mess, it looks to be more complex than it needs to be, probably again due to historical reasons.

For this it is very good that you try to improve it.

Regarding the isOctetString I think the reason might be to make sure that the following is true:

'ABC' asSymbol == 'ABC' asWideString asSymbol.

Since Symbols are tested with #==, two different subclass instances, ByteSymbol and WideSymbol can never be identical, while the following is true due to the way #= is implemented

'ABC' = 'ABC' asWideString.

The question is how often will such a strange case occur (i.e. a Latin1 string that becomes a WideSymbol) and is it necessary to pay the (heavy) price for this at symbol interning time ?

MarcusDenker commented 2 years ago

@MarcusDenker

But:

'ABC'  = 'ABC' asWideString. "true"

Thus, the search for an existing symbol (using #like on the WeakSet) will find the symbol regardless if it looks for a wide string or not.

if I change the (current) #intern: to use

    aClass := aStringOrSymbol isWideString ifTrue:[WideSymbol] ifFalse:[ByteSymbol].

I get:

'ABC' asSymbol == 'ABC' asWideString asSymbol "true"
 'ABC' asWideString asSymbol == 'ABC' asSymbol  "true"

But, now asking for the symbol after having asked for the equal wide symbol returns the wide symbol (eval in one doit due to to GC):

'ABCCC' asWideString asSymbol class. 
'ABCCC'  asSymbol class "WideSymbol"

But to me this is not a problem, as the only thing that matters is identiy.

MarcusDenker commented 2 years ago

@svenvc

I think we can try the change. I would add a specific test case and maybe a comment.

MarcusDenker commented 2 years ago

After merging https://github.com/pharo-project/pharo/pull/11291, the method where the check is done is WideString >> createSymbol