squeak-smalltalk / squeak-object-memory

Issues and assets related to the Squeak object memory.
https://bugs.squeak.org
MIT License
12 stars 1 forks source link

Failed to get timestamp for methods of unicode named classes #17

Closed dram closed 2 years ago

dram commented 2 years ago

In Squeak 6.0 alpha, For classes which have unicode names, browser will be failed to get timestamp for their methods, only no timeStamp is displayed, e.g.:

image

dram commented 2 years ago

The problem is caused by CompiledMethod>>#getPreambleFrom:at:, which does not support non-ASCII characters, as already stated in the comment.

dram commented 2 years ago

Submit a patch to The Inbox: https://lists.squeakfoundation.org/pipermail/squeak-dev/2022-May/220279.html

BTW, as in Squeak 6.0 sources and changes files are always UTF-8 encoded, it seems not necessary to append language tags to chunks anymore, e.g.:

!测试类 methodsFor: 'as yet unclassified' stamp: 'xw 5/4/2022 09:08'!]lang[(3 61)6,0!
dram commented 2 years ago

Seems that there is some problem about subscribing to squeak-dev mailing list, I will reply to the message[1] here:

Merged. Please add commentary to magic numbers such as "2" the next time.

Thanks! Will do the next time.

I also noticed that @LinqLover reported an issue[2] which is related to CompiledMethod>>#timeStamp. It may be caused by my modification to CompiledMethod>>#getPreambleFrom:at:, but currently I have no clue about this problem, will do some more investigation.

Regarding to the magic number 2, endPosition argument passed into CompiledMethod>>#getPreambleFrom:at: is pointed to position before the last character of the preamble, so in order to make use of PositionableStream>>#backChunk, this position is advanced in two characters, right after the ! character.

[1] https://lists.squeakfoundation.org/pipermail/squeak-dev/2022-May/220281.html [2] https://lists.squeakfoundation.org/pipermail/squeak-dev/2022-May/220311.html

dram commented 2 years ago

The root cause of CompressedSources's incomplete preamble issue is that implementation of CompressedSourceStream is not compatible with PositionableStream>>#skip:.

Recently commited new implementation of CompiledMethod>>#getPreambleFrom:at: relies on PositionableStream>>#backChunk, which indirectly relies on PositionableStream>>#skip:.

There are two possible solutions, 1. update PositionableStream>>#skip:, 2. add override method CompressedSourceStream>>#skip:. The code is same, instead of direct access of instance variable position, accessor method is used instead, i.e.:

skip: anInteger 
    self position: self position + anInteger
dram commented 2 years ago

Submit a patch to The Inbox https://lists.squeakfoundation.org/pipermail/squeak-dev/2022-May/220335.html

LinqLover commented 2 years ago

Genious! Thank you, Xin, I confirm that with Compression-xw.63, my image can be updated again with cached source files. :-)

dram commented 2 years ago

Submit another patch related to timestamp: https://lists.squeakfoundation.org/pipermail/squeak-dev/2022-May/220396.html

marceltaeumel commented 2 years ago

Merged. Thank you!

marceltaeumel commented 2 years ago

BTW, as in Squeak 6.0 sources and changes files are always UTF-8 encoded, it seems not necessary to append language tags to chunks anymore, e.g.:

Language tags allow for language-specific text composition where Unicode's Han Unification is too generic. Yes, I think we might drop the ]lang[ tag and the concept of having a leadingChar in the future. As long as we have leadingChar in the image, it makes no sense to drop the support of ]lang[ tags.

See the class comment of Character.

dram commented 2 years ago

Thanks for pointing to comment of Character, very detailed!

I find leadingChar quite confusing in some scenarios, e.g.:

image

For the first example, one of those two 测试 strings is input from keyboard, and the other pasted from clipboard, they seems to be equal, but result is false, because of leading char.

The second example, I try to fetch some text from Web, and search for wide string patterns, also one from keyboard and the other from clipboard, which will lead to different results.

It is quite frustrated if strings in code contains leading chars, and they are used to comparing outside texts, even find ... in text editor will not work, not to mention that it's even worse if wide characters are contained in class and method names.

It seems that there are not enough time for considering remove of leadingChar in Squeak 6.0, but I still hope that at least we can disable code related to leading char in InputInterpreter.

dram commented 2 years ago

I write a proposal regarding to InputInterpreter: https://github.com/squeak-smalltalk/squeak-object-memory/issues/18