Open JuliaPoo opened 1 year ago
Plugged the code from the recitation into the [visualization](https://cscircles.cemc.uwaterloo.ca/java_visualize/#code=public+class+ClassNameHere+%7B%0A+++public+static+void+main(String%5B%5D+args)+%7B%0A++++++abstract+class+A+%7B%0A+++++++++abstract+void+g()%3B%0A++++++%7D%0A%0A++++++class+B+%7B%0A+++++++++int+x+%3D+1%3B%0A%0A+++++++++void+f()+%7B%0A++++++++++++int+y+%3D+2%3B%0A%0A++++++++++++A+a+%3D+new+A()+%7B%0A+++++++++++++++void+g()+%7B%0A++++++++++++++++++x+%3D+y%3B%0A+++++++++++++++%7D%0A++++++++++++%7D%3B%0A%0A++++++++++++a.g()%3B%0A+++++++++%7D%0A++++++%7D%0A%0A++++++B+b+%3D+new+B()%3B%0A++++++b.f()%3B%0A+++%7D%0A%7D%0A%0A&mode=display&curInstr=20), for anyone who wants to try it.
Visualization vs. Recitation
Would like more elaboration on what the disrepancy is. Might be because it's late and I'm not following too well. Is it the lack of frames (in the recit) as opposed to the JVM very clearly separating frames per method?
I'll be writing up on how the JVM deals with inner classes. The simulation I linked is incomplete and hence ur screenshot of it is also incomplete. From what I gather, the note's heap layout for the instance of abstract class A
is accurate. The reason why it assumes such a layout however, is never mentioned in the notes and I'm guessing we are somehow meant to infer why. (E.g., we are meant to ask ourselves and answer questions like, why is y
duplicated in the JVM? Like why can't the method a.g
simply access the variable y
from the stack shown in the notes)
Yeah, that's a good point tbh. Also I double checked SE17 (the one we're using) vs. SE19 (your reference) docs, and both are identical with regards to the JVM stack.
My thoughts, as discussed in Telegram already:
From the recit, it does seem like they're drawing the stack in
I do agree that mentioning frames (and maybe the operand stack) may have been a clearer way to illustrate the JVM stack. Would like to know prof's thoughts on this. Maybe because 2040 is separate from 2030 that it's not gone into detail so as to not confuse students? Idk. Frames could be a nice resource for folks who want to do further reading in the future.
Metadata
While reading the notes I had some discrepancies within the notes' description of the Java Virtual Machine (JVM), in particular Stack vs Heap, and my internal conception of what's happening, and attempted to resolve my understanding by consulting the JVM spec and disassembling the java bytecode. Upon doing so, I noticed significant discrepancies between what the notes present and the spec, and I attempt to resolve them here. I might be adding more if I have time and have more questions.
If there are any errors I made do point them out, I have no experience with the JVM prior to this.
Clarifications:
Preliminaries
The Stack: The notes VS the Spec
The Notes
The notes (Lecture 1) describe the stack as a "LIFO (Last In First Out) stack for storing activation records of method calls". In it, we see the stack containing the local variables for each nested method call. This conception aligns well with my intuition for the C-stack. However, it starts to break down when it comes to questions like
return
), how does JVM know where to continue execution?int a = x.length() + y.length()
, how is the value ofa
computed?The Spec
The Spec kinda agrees with the note's conception of the stack: JVM does have a LIFO stack that tracks method calls. However, each element in the stack isn't a local variable: Each element is a frame.
A frame is added when JVM enters a method, and removed when it returns from a method. Note, ONE frame is added to the stack PER method entered. Each frame, then contains a Local Variable Array and an Operand Stack. So at any point in time, a JVM thread manages at least TWO stacks: The stack that contains the frames (ill be calling this the Call Stack), and the current frame's operand stack. Note that the local variables of the current frame aren't in a stack: They are in an array.
When JVM enters a method, a frame is created and the Local Variable Array is populated with the method's arguments (E.g., calling
f(a=1, b=2)
adds(int a,1), (int b, 2)
key-value pairs to the Local Variable Array). The Local Variable Array is also appended to during runtime (E.g., when executingint a = 1
, and(int a, 1)
key-value pair is added to the Local Variable Array). The operand stack, on the other hand, performs intermediate operations in computing the value of the local variables/return value. (E.g.,a = 1+3
, JVM uses the operand stack to compute1+3 = 4
and stores the result in the Local Variable Array entry fora
).Aside: In class methods that have the
this
variable. Within JVM, thethis
variable, a reference to its class, is passed in as the first argument in the method, so the first element of the Local Variable Array in such a method is the(this, <reference in heap>)
key-value pair.The operand stack and local variable array can, of course, store references to objects in the heap as well, and the operand stack can store references to the local variable array. Do note that these are only in the current frame. To my knowledge, there isn't a way for a current frame to directly access the operand stack/local variable array of another frame.
When JVM exits the method, the return value is passed to the previous frame somehow (idk how) and the current frame is removed. Execution continues in the previous frame.
Refer to Here for an incomplete visualisation of what I mean (The link visualises the Call Stack and the Local Variable Array but not the Operand Stack, and doesn't account for everything, like inner classes).
Aside: The spec describes JVM as both a Register Machine and a Stack Machine. The Local Variable Array is serving the function of the Registers in a Register Machine. The Operand Stack serves as the Stack in a Stack Machine. Here's a brief example of how the JVM Operand stack works:
Suppose we have the following code:
The corresponding disassembly looks like this:
Attempt to resolve
Here are the main 3 concerns I have when I saw this discrepancies:
I feel like the stack as described in the notes combines the Local Variable Array of all frames on the call stack into one big stack, with the order on the stack being the order in which the variables are created? This is a very liberal interpretation of the spec and I have no idea how to resolve this discrepancy.
Accessing variables that isn't in the Local Variable Array
Okay, so each frame's code can access the Local Variable Array, which includes method arguments. But as we know, code within a method can access way more than that. E.g.,
Within
Main.main
, the symbolsB
andB.f
are not inMain.main
's Local Variable Array, and yet they can be accessed.The solution is in the Constant Pool. Each frame has its own Constant Pool that contains all references to symbols that the frame should be able to access, but isn't a local variable or an argument. E.g., for the example above we have:
Do note that in the Constant Pool, not all symbols have been 'initialised'. E.g., In the above, the spec totally allows
#7
(a reference to classB
) to reference nothing UNTILMain.main
needs it, at which point the fileB.class
will be loaded into memory and#7
is made to reference classB
.Inner Classes: Heap layout explanation
The Notes
Consider this snippet from Recital 7 and the heap layout claimed:
This immediately raises a few questions regarding the heap object
A
:y
variable duplicated in the stack AND the heap (as part of abstract classA
). Couldn'ta.g
simply accessy
via the stack?A
have a reference to its outer classB.this
in the heap when it could similarly accessB.this
in the stack?y
is duplicated, why isn'tB.this.x
duplicated?The Spec
We can dump the heap layout when executing
a.g
to see that the note's heap layout is essentially correct.What's happened here is that, in
B.f
, the inner class is internally represented asB$1 extends A
:Here's my reasoning for the 3 questions mentioned above:
y
inB.f
lives withinB.f
's frame in the Local Variables Array. Since the frame fora.g
is different thanB.f
,a.g
is unable to access the variabley
fromB.f
's frame. It hence has to be passed as an attribute into the inner classB$1
.B.this
, a reference to the outer class is similarly passed as an attribute into the inner classB$1
by default.B.this.x
to be duplicated as a property ofB$1
because it is already accessible viaB.this
.Subclasses: Heap layout explanation
The Notes
Consider the code
The notes show that the heap layout for
FilledCircle
is a 'merge' of both itself and it's superclass:Note that I changed the class
Color
toint
to make things easy.The Spec
We can similarly dump the heap to see that the notes pretty much agree with the spec:
So unlike how inner classes are dealt with (i.e., via having a reference to its outer-class as an attribute), inheritance is dealt with by simply merging the heap layout of the superclass and itself as a single entry.
TODO if have time and motivation
return
works: How the JVM knows where to continue execution upon exiting a methodOverall Questions
Resources