squeak-smalltalk / squeak-object-memory

Issues and assets related to the Squeak object memory.
https://bugs.squeak.org
MIT License
12 stars 1 forks source link

Weird behavior in accessors after deleting an instvar #122

Closed MariusDoe closed 1 month ago

MariusDoe commented 1 month ago

I have noticed a weird behavior that occurs when you delete an instvar from a class, but keep the accessors for it and call them.

Steps to reproduce

  1. Create a class Dummy
  2. Add an instvar myVar
  3. Add accessors myVar and myVar (^ myVar and myVar := anObject, respectively)
  4. Delete the instvar myVar
  5. Open a workspace
  6. Execute Dummy new myVar: 42.
  7. Print Dummy new myVar (note: this is a new instance, do not keep the instance from the previous step)

Expected behavior

One of the following:

Actual behavior

42 is printed in step 7

Analysis

The problem occurs because after step 4, the accessors use the bytecodes pushLitVar: and popIntoLit:, with the referenced literal being the global Undeclared associationAt: #myVar. This results in the myVar: call storing the 42 into the Association's value, which the myVar call on an unrelated instance will read, because the method references the same (global) Association in its literals.

I have traced the reason for the use of these bytecodes:

  1. Class>>compileAllFrom: is called during step 4
  2. During parsing, Parser>>variable is called
  3. It doesn't find the variable myVar in scope
  4. It calls Parser>>correctVariable:interval:
  5. Because the recompilation of all methods is not interactive, Encoder>>undeclared: is called
  6. Encoder>>global:name: is called with Undeclared associationAt: #myVar
  7. A LiteralVariableNode with the association as its key is returned
  8. Afterwards, either an AssignmentNode (for the setter) or a ReturnNode (for the getter) is created
  9. After parsing, MethodNode>>generate: is called
  10. In the setter case, AssignmentNode>>emitCodeForEffect:encoder: calls LiteralVariableNode>>emitCodeForStorePop:encoder:, which calls Encoder>>genStorePopLiteralVar:
  11. In the getter case, ReturnNode>>emitCodeForValue:encoder: (indirectly) calls LiteralVariableNode>>emitCodeForValue:encoder:, which calls Encoder>>genPushLiteralVar:
  12. Both reference the global Association, which was previously written into the literals of the Encoder
LinqLover commented 1 month ago

Thanks for the report and the deep-dive analysis @MariusDoe! I wonder however whether this could be accepted behavior ...

Expected behavior

One of the following:

  • An error is shown in step 4 (something like The instVar myVar is still referenced in these 2 methods: ...)

This might be a helpful enhancement to our tooling (similar to #123), but we can or should probably not raise an error right from the Class/ClassBuilder. Such incidents probably occur often during automatic actions such as loading packages, installing updates, etc., and I think there needs to be some automatic fallback. In some situation, the ill-defined state might even be transient until another class definition is loaded/updated.

  • An error is shown in step 6 (something like The instVar myVar does not exist)

For the same reason as above, I don't think this would be a good solution (though probably fun to implement by inserting some extra error sends through the encoder). IMHO it also sounds a bit too much like a ticking time bomb that you cannot see by looking at the source code.

  • nil is printed in step 7

I actually had assumed this would be the default behavior. :o But then again, I remember a situation where this wasn't the case but I was glad about it ...

Once I added an extension method to a class and wanted to store some extra state related to the receiver object. Because I was inadvertent/too lazy to define an extra class that holds this state (usually through a WeakIdentityKeyDictionary), I just added an instance variable to the class. Of course, when storing this extension method in a package or change set, the changed class definition is not applied, so checking out this patch in another image leads to an undeclared instance variable. Thanks to the current behavior, the extension method can however still manage the state of the object ... as long as it's a singleton. Otherwise, things might get very confusing, including possible memory leaks. But for singletons and class variables (which IIRC use the same mechanism), it's a helpful fallback.

Still, one might argue that this is too dangerous as your example with unrelated instances shows, and might prevent programmers from defining their state properly in the first place. I just realize that this would even lead to shared state between instances of different classes. Oh no.

Another alternative might be to maintain per-instance values in the Undeclared table, something like (Undeclared associationAt: 'myVar') at: anObject or (Undeclared associationAt: {'myVar'. anObject}. But this would further increase the hidden complexity.

Other opinions from the room? :-)

squeak-smalltalk-bot commented 1 month ago

On 2024-07-10, at 9:37 AM, Christoph Thiede via Squeak-dev @.***> wrote:

Thanks for the report and the deep-dive analysis @MariusDoe! I wonder however whether this could be accepted behavior ...

It certainly is; this is 'normal' in the sense that it is deliberate and expected, and indeed has been the case pretty much forever. What it does is make sure you have no nasty dangling pointers - after all, this isn't C. Doing things this way makes (almost) sure your code doesn't crash the system. Yes, you need to know about it, and to look at the Undeclared dictionary occasionally, and arguably we might change the compiler/browser to alert you more directly.

tim

tim Rowledge; @.***; http://www.rowledge.org/tim Strange OpCodes: DUL: Delete Utility Library

LinqLover commented 1 month ago

But that would not stop us from mapping the undeclared variables references to nil as Marius suggested. That is, ^undeclaredInstVar returns nil, and undeclaredInstVar := 42 becomes a no-op. Is the current behavior of global state actually desired, or was it just easier to imlpement?

Best, Christoph

squeak-smalltalk-bot commented 1 month ago

On 2024-07-10, at 10:38 AM, Christoph Thiede via Squeak-dev @.***> wrote:

Is the current behavior of global state actually desired, or was it just easier to imlpement?

Can't help with that; maybe Dan Ingalls would remember.

Diverting things to a global has some advantages; for example it keeps things kinda-sorta working and provides 'safe' time to fix the actual problem. My preference would be to leave it alone and consider changes to the compiler/browser tools to make any 'improvements' in behaviour when deleting or moving ivars.

tim

tim Rowledge; @.***; http://www.rowledge.org/tim Fractured Idiom:- VISA LA FRANCE - Don't leave chateau without it

squeak-smalltalk-bot commented 1 month ago

On Wed, Jul 10, 2024 at 10:38 AM Christoph Thiede via Squeak-dev < @.***> wrote:

But that would not stop us from mapping the undeclared variables references to nil as Marius suggested. That is, ^undeclaredInstVar returns nil, and undeclaredInstVar := 42 becomes a no-op. Is the current behavior of global state actually desired, or was it just easier to imlpement?

It is the way forward declarations are handled. It is a very simple understandable mechanism. Global variables are the values of associations in dictionaries. Fetching and storing a global variable is effectively the send of value & value: to an association whose key is the name of the variable. Any variable mention that is undeclared is substituted for with a global variable stored in the Undeclared dictionary, which sits alongside Smalltalk, the dictionary containing declared globals (global variables and classes). Whenever a global or class variable is declared the name is looked up in Undeclared and if found then the association is moved out of Undeclared into the relevant dictionary (classPool or Smalltalk), hence preserving references to the newly declared variable. This allows considerable freedom oin file-in order, the major constraint on which becomes superclasses having to precede subclasses.

All programmers should check Undeclared regularly (there is an "Undeclared removeUnreferencedKeys; inspect" item on the "do..." menu for this very purpose). And the Undeclared mechanism should be taught and conveyed. It has been an integral part of the system that predates Smalltalk-80.

Now, when an instance variable is declared things are slightly different, because the new variable is not global and hence the association is not used. Here the simplest thing that could possibly work is to leave the association in Undeclared (hence removeUnreferencedKeys in the above). Now that implementations are faster it is affordable to check for the variable being in use, and remove it automatically, but is there really a need?

What I would like to see is an indication on the main menu bar that Undeclared is non-empty, an indication that if clicked on as a button would open a browser on all methods referring to undeclared variables.

What I don't want to see is a mechanism that is well thought through, that does a great job most of the time, being derided and replaced by something whizzy just because it hasn't been communicated properly to the community.

Best, Christoph

— Reply to this email directly, view it on GitHub https://github.com/squeak-smalltalk/squeak-object-memory/issues/122#issuecomment-2221093126, or unsubscribe https://github.com/notifications/unsubscribe-auth/BFYAK6IF6XXL5JCM2LBSZCLZLVWPZAVCNFSM6AAAAABKUSVMXWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDEMRRGA4TGMJSGY . You are receiving this because you commented.Message ID: @.> Squeak-dev mailing list -- @. To unsubscribe send an email to @.***

-- ,,,^..^,,, best, Eliot

LinqLover commented 1 month ago

(Note: It seems that @squeak-smalltalk-bot is still unable to mirror the entire conversation, check out https://lists.squeakfoundation.org/archives/list/squeak-dev@lists.squeakfoundation.org/thread/7Q5QUNQDFNOIWNZNYB7IG2Q4OJTVKOXI/ to read all messages)

codefrau commented 1 month ago

I would very much suggest that this discussion is ill-suited for a bug tracker (and IMHO squeak-smalltalk-bot should not be allowed to add comments since it blurs the lines between mailing list and bug tracker).

The behavior OP described is not a bug, it works exactly as designed. This issue should be closed.

One piece of information was buried in the comments above: The Transcript will indeed print a warning about each method that had a variable which became undeclared.

Everything beyond that is a design discussion that should happen on the mailing list. There could be a new ticket but again bug trackers are not for discussions.

marceltaeumel commented 1 month ago

@MariusDoe Please make a new issue to suggest GUI/Tool improvements to better inform users/programmers about this behavior. Our tool support around undeclared things is quite poor at this point.

LinqLover commented 1 month ago

Hi Vanessa, Marcel, all,

TBH I don't agree with this perspective on bugs.squeak.org. We don't use this as a real bug tracker in the sense of properly maintaining and tracking issues. From my impression, we only use GitHub Issues 1) to reduce the hurdle for people to report bugs by not requiring them to subscribe to the mailing list and 2) perhaps to gain some more visibility in the GitHub network. Thus, it functions (or ought to function) as an exclave or alternative interface to the mailing list: few people have subscribed to GitHub Issues, and many one-time reporters have not subscribed to the mailing list. If we had not merged both communication platforms, they would diverge. Either some people would miss information (such as about a bug having been fixed) or other people would have to write and read redundant messages. Also, there would be just another separate place that one had to search in order to find historical information about design decisions, implementation considerations, etc. Today, it is quite easy to find most of them in one place using Squeak Inbox Talk. So, I don't think the arguments of Fogel et al. can be applied here. This forum IS a part of the mailing list. :-)

IMHO squeak-smalltalk-bot should not be allowed to add comments

I implemented this behavior, which closely matches that of the OpenSmalltalkVM-Bot for vm-dev, earlier this year in agreement with Marcel on behalf of the previous board for the reasons mentioned above, and I gave the board a prior notice before the bot went live. I am surprised that it now faces existential critism after its prototype has been in place for a couple of years. Of course we can change anything, but I think we should at least give it a fair discussion.

The behavior OP described is not a bug, it works exactly as designed.

Does it? So far, I have only read "it is like that" and a good explanation of how it is implemented in this thread. I am not aware of any test that expects this behavior nor any relevant statement in the Blue Book or the ANSI standard, and personally I have not been convinced why it must be like that. IMHO Marius has made an important point about the confusion emerging from this behavior, and we can see that it introduces hidden shared state, which is arguably not better than eliminating the state in the first place but even might introduce memory leaks.

On another note. Personally, I don't like closing issues before achieving a consensus or at least encountering all arguments on the table. I would also be reluctant to put the burden of brainstorming solutions to random bug reporters. Beyond that, given that we do not really take advantage of any structure of the bug tracker, I can see little sense in closing one issue in favor of another one. Renaming seems easier. But if we do so, as a rule of thumb, I would never close an issue before either a fix has been implemented or another issue has been created. Let's not throw away the advantage of bug tickets -- perhaps the most used one on bugs.squeak.org -- of traceability by their open/closed state. Besides, the classification "Closed as completed" applied to this issue is wrong (it should be "closed as not planned", and btw this is perhaps redundant with the "ignored" label). :-) Generally speaking, I am grateful if any Squeak user of whatever level of expertise donates some time to us for sharing an observation about the system. We cannot (or should not) expect them to track an observation down to the originating cause or even suggest new design decisions (though we always welcome this kind of contributions).

+1 for brainstorming better tooling for undeclared instance variables (as we started above).
-0.5 for keeping the current behavior of the compiler in place (at least we should properly finish our discussion about that first).
-100 for using the bugs.squeak.org tool like this. :-)

Best, Christoph

squeak-smalltalk-bot commented 1 month ago

Been like that for as long as I've been using Smalltalk, so almost certainly been there since the earliest versions of Smalltalk-80 at the least. And really, just how would one do much different, especially with the available performance back then - remember current machines are at least a million times faster and have about a million times more memory.

On 2024-07-14, at 3:24 PM, Christoph Thiede via Squeak-dev @.***> wrote:

The behavior OP described is not a bug, it works exactly as designed.

tim

tim Rowledge; @.***; http://www.rowledge.org/tim when people are free to do as they please, they usually imitate each other

LinqLover commented 1 month ago

But this is only an appeal to tradition, right? Given that this is only a recovery mechanism, I hope no one will seriously depend on this behavior, or are you concerned about compatibility?

And really, just how would one do much different, especially with the available performance back then - remember current machines are at least a million times faster and have about a million times more memory.

I outlined two possible behaviors above: 1) Ignoring all assignments to undeclared variables and treating them as nil, 2) Discriminating the value of an undeclared variable for each instance of the class. To my understanding, 1) should be as performant as the current behavior, 2) would increase memory consumption if you have many instance for the sake of correctness but be similar fast to the current state. I'm not sure why you are mentioning performance here. :sweat_smile:

squeak-smalltalk-bot commented 1 month ago

On 2024-07-14, at 5:10 PM, Christoph Thiede via Squeak-dev @.***> wrote:

But this is only an appeal to tradition, right? Given that this is only a recovery mechanism, I hope no one will seriously depend on this behavior, or are you concerned about compatibility?

Hmm, there is certainly a bit of that; but then one might somewhat snarkily says that about almost all the ways Smalltalk works. It's been a deliberate feature of the system essentially forever, it works to prevent the removal of instvars from being a dead-pointer problem (Oh, C, how I hate...) and you need to know one simple thing to understand how to use it and fix your code. That's pretty good for a programming system.

And really, just how would one do much different, especially with the available performance back then - remember current machines are at least a million times faster and have about a million times more memory.

I outlined two possible behaviors above: 1) Ignoring all assignments to undeclared variables and treating them as nil, 2) Discriminating the value of an undeclared variable for each instance of the class. To my understanding, 1) should be as performant as the current behavior, 2) would increase memory consumption if you have many instance for the sake of correctness but be similar fast to the current state. I'm not sure why you are mentioning performance here. 😅

Available performance dramatically affects what you can realistically do in an interactive environment. You're actually young enough that I truly doubt you can really feel what an effect it could have over factors of a million - hell, I really can't and I've not just lived through but contributed to it. That can massively affect what design choices you have available, and thus what you even imagine building.

Your 1) suggestion could be a good answer to be honest. Making the system not-explodey for removed instvars by making any reference return nil is a perfectly decent choice. It would be interesting to find out who implemented the current function (decent chance it was Dan Ingalls, or Ted Kaehler)and see if they remember why the fake-global approach was chosen. It does have the virtue that there is some live value returned and perhaps that kept enough object working decently enough that it was helpful.

Making a separate faux-variable for each instance would be a terrible step onto a path that leads to pain, anguish, and.... Python. Nobody wants to go there.

I made a couple of suggestions as to how tools could provide assistance.

tim

tim Rowledge; @.***; http://www.rowledge.org/tim Strange OpCodes: SCEU: Simulate Correct Execution, Usually