mmtk / mmtk-core

Memory Management ToolKit
https://www.mmtk.io
Other
379 stars 69 forks source link

Require ObjectReference to point inside object #1195

Closed wks closed 2 months ago

wks commented 2 months ago

Require the raw address of ObjectReference to be within the address range of the object it refers to. The raw address is now used directly for side metadata access and SFT dispatching. This makes "in-object address" unnecessary, and we removed the concept of "in-object address" and related constants and methods.

Methods which use the "in-object address" for SFT dispatching or side-metadata access used to have a <VM: VMBinding> type parameter. This PR removes that type parameter.

Because ObjectReference is now both within an object an word-aligned, the algorithm for searching for VO bits from internal pointers is slightly simplified. The method is_mmtk_object now has undefined behavior for arguments that are zero or misaligned because they are obviously illegal addresses for ObjectReference, and the user should have filtered them out in the first place.

Fixes: https://github.com/mmtk/mmtk-core/issues/1170

qinsoon commented 2 months ago

This section in the porting guide should be updated: https://docs.mmtk.io/portingguide/howto/nogc.html#objectreference-vs-address. It would be better to add an example to reference the pattern in JikesRVM to deal with different language/VM-level object reference and MMTK-level object reference.

wks commented 2 months ago

binding-refs OPENJDK_BINDING_REPO=wks/mmtk-openjdk OPENJDK_BINDING_REF=feature/objref-in-object2 JIKESRVM_BINDING_REPO=wks/mmtk-jikesrvm JIKESRVM_BINDING_REF=feature/objref-in-object2 V8_BINDING_REPO=wks/mmtk-v8 V8_BINDING_REF=feature/objref-in-object2 RUBY_BINDING_REPO=wks/mmtk-ruby RUBY_BINDING_REF=feature/objref-in-object2 JULIA_BINDING_REPO=wks/mmtk-julia JULIA_BINDING_REF=feature/objref-in-object2

wks commented 2 months ago

This section in the porting guide should be updated: https://docs.mmtk.io/portingguide/howto/nogc.html#objectreference-vs-address. It would be better to add an example to reference the pattern in JikesRVM to deal with different language/VM-level object reference and MMTK-level object reference.

I added more details to that section. I also changed the "Allocation" section below to emphasize that the VM binding shall nominate an address used for ObjectReference.

wks commented 2 months ago

Related PRs:

qinsoon commented 2 months ago

With the change of this PR, ObjectReference could be different from the VM-level object reference anyway, and the object start address always satisfies the definition of ObjectReference. Why don't we just use the object start address as ObjectReference? In that case, we don't need the term 'the object start address', the bindings don't need to pick a MMTk ObjectReference, and alloc() simply returns an ObjectReference. It further simplifies our API.

It may be more costly for VMs like JikesRVM to convert between VM-level object reference and ObjectReference/'object start address'. But it would be interested to see if the cost is significant or not.

Anyway, this is just some thoughts. I will review the PRs tomorrow.

wks commented 2 months ago

... Why don't we just use the object start address as ObjectReference? ...

It may be more costly for VMs like JikesRVM to convert between VM-level object reference and ObjectReference/'object start address'. But it would be interested to see if the cost is significant or not.

...

It may be easy to compute the starting address from the ObjectReference (which usually points at the header, or at a constant offset from the header), but the other way (i.e. computing the ObjectReference, the header, or whatever needed to access in-object metadata) may not be easy, if possible at all.

One example is JikesRVM. Given the staring address, MMTk may call Scanning::scan_object(starting_address, field_visitor). Then JikesRVM needs to find the TIB, and therefore needs to decide whether the JavaHeader is at starting_address (when not having a hash in the front) or starting_address + 4 (with a 4-byte hash in the front). This is difficult, but possible. JikesRVM can reserve one bit in the JavaHeader, ensure that the bit is always 0 in the JavaHeader, but the same bit is always 1 in the hash. The GC can tell whether the first word is the JavaHeader or the hash by looking at that bit. That's complicated anyway, and may be invasive for other VMs.

Another example is SableVM. Its header is in the middle of the object. Reference and non-reference fields span towards both directions. In this case, it is impossible to find the header when given the starting address because the fields can simply contain arbitrary data that may look like the header.

So it has to be decided by the VM binding for now, to find a middle ground between a simple API and make it efficient to find object metadata.

Speaking of metadata, I think objref.to_header() may or may not be necessary, given that the offsets of each metadata spec can be arbitrary. It's convenient for JikesRVM because all metadata offsets were relative to the JikesRVM-level ObjectReference. For other VMs, it is usually a constant offset from ObjectReference for efficiency reasons. But let's discuss about that in the future.