mmtk / mmtk-core

Memory Management ToolKit
https://www.mmtk.io
Other
379 stars 69 forks source link

Allow roots to be pinned for StickyImmix nursery collections #1108

Closed k-sareen closed 7 months ago

k-sareen commented 7 months ago

This PR allows StickyImmix to properly deal with transitive pinning trace. Before this PR, StickyImmix may move objects in the transitive pinning trace, as it simply uses GenNurseryProcessEdges which allows moving. With this PR, for transitive pinning trace, StickyImmix uses GenNurseryProcessEdges<..., TRACE_KIND_TRANSITIVE_PIN>. This PR fixes https://github.com/mmtk/mmtk-core/issues/1097.

k-sareen commented 7 months ago

Again reinforcing that transitively pinning is still unsupported for the StickyImmix nursery GC (though it should be possible to support it). The main question is how to treat objects in the modbuffer.

@wks Can you try this PR out for Ruby to see if it allows objects to move during the nursery GC?

wks commented 7 months ago

Again reinforcing that transitively pinning is still unsupported for the StickyImmix nursery GC (though it should be possible to support it). The main question is how to treat objects in the modbuffer.

@wks Can you try this PR out for Ruby to see if it allows objects to move during the nursery GC?

I confirmed that with this PR (and the suggested changes I mentioned in the last comment), Ruby is able to pin object during nursery GC, and the GC is also able to move objects during GC. I am not sure whether it works perfectly because the Ruby binding itself has not been fully tested against moving GC, yet.

k-sareen commented 7 months ago

Done @wks

k-sareen commented 7 months ago

But now with the fix, transitively pinning is actually unsupported (given the issue regarding modbuffer) but MMTk will allow a VM developer to proceed regardless. This can blowup in glorious ways.

wks commented 7 months ago

But now with the fix, transitively pinning is actually unsupported (given the issue regarding modbuffer) but MMTk will allow a VM developer to proceed regardless. This can blowup in glorious ways.

I don't think it conflicts with transitive pinning. They are orthogonal.

The fields of the objects in the modbuf should be considered non-pinning roots (unless the VM has something like "potential pinning parents" like Ruby). If an object is added to the modbuf, it simply means the object has been modified before the current GC, and some of its children may point to young objects. Being in the modbuf doesn't mean the field of the object cannot be updated. But if the same young object is also reachable from pinning roots, the object will be pinned; if it's reachable from transitive pinning roots, it and its children will all be pinned.

I think the current implementation of ProcessModBuf is fine:

        if is_nursery_gc() {
            // Scan objects in the modbuf and forward pointers
            let modbuf = std::mem::take(&mut self.modbuf);
            GCWork::do_work(
                &mut ScanObjects::<E>::new(modbuf, false, WorkBucketStage::Closure), // Put into the Closure bucket
                worker,
                mmtk,
            )
        }

As long as the ScanObjects work packet is put into the Closure bucket, it will be executed after all the transitive pinning roots and their children are traced, and also after all non-transitive pinning roots are traced. It will still keep the children of objects in the modbuf alive, but whether they are pinned is determined by other (transitively pinning or non-transitively pinning) roots.

k-sareen commented 7 months ago

Hm. I see. It'd be better if Julia could actually test this PR out to ensure that nothing breaks since neither ART nor Ruby use the transitively pinning roots.

k-sareen commented 7 months ago

@qinsoon @wks The JikesRVM test failure was due to a timeout. Have we seen something like this before? I am not sure why this PR would cause a timeout.

qinsoon commented 7 months ago

@qinsoon @wks The JikesRVM test failure was due to a timeout. Have we seen something like this before? I am not sure why this PR would cause a timeout.

I think we have seen timeouts, but not frequently.

k-sareen commented 7 months ago

JikesRVM just seems like it's a flaky CI. This code doesn't even touch the GCs that JikesRVM supports. I don't know why we get the failing instruction not in RVM address error. Should I investigate further?

k-sareen commented 7 months ago

Idk why the ready-to-merge check timed out. I just re-ran it.

wks commented 7 months ago

https://github.com/mmtk/mmtk-core/actions/runs/8657791364/job/23740512154 It looks like it timed out while running GenImmix. This is different from https://github.com/mmtk/mmtk-core/issues/1103 which is caused by a race condition in MarkSweep.

It may be an unknown bug.

k-sareen commented 7 months ago

Idk why the ready-to-merge check timed out. I just re-ran it.

@qinsoon I think the ready-to-merge check is broken when we have the binding tests. They're called "extended-tests-\" instead of "\-binding-test".

k-sareen commented 7 months ago

https://github.com/mmtk/mmtk-core/actions/runs/8657791364/job/23740512154 It looks like it timed out while running GenImmix. This is different from #1103 which is caused by a race condition in MarkSweep.

It may be an unknown bug.

Yikes. OpenJDK shouldn't time out since this should only change how the pinning/transitively pinning roots work. I'll investigate locally when I have the time.

OR This could be the result of the new change where we removed the coordinator. It would be ideal to be able to ssh to the machine and then inspect the deadlocked process.

wks commented 7 months ago

OR This could be the result of the new change where we removed the coordinator. ...

Yes. This is possible. By removing the coordinator, the mechanism for triggering GC has changed. There could be some corner cases that are not handled properly, but has not been discovered, yet.

k-sareen commented 7 months ago

@qinsoon I'm not sure if the ready to merge check was fixed properly. It seems like it's still waiting on something

qinsoon commented 7 months ago

@qinsoon I'm not sure if the ready to merge check was fixed properly. It seems like it's still waiting on something

Not sure what is happening there. The problem of wrong ignored action names was fixed.

qinsoon commented 7 months ago

@qinsoon I'm not sure if the ready to merge check was fixed properly. It seems like it's still waiting on something

Yeah. It was actually a mistake. https://github.com/mmtk/mmtk-core/pull/1119 should fix it. The action name for OpenJDK was changed due to the reused workflow. So the merge check was actually waiting for the OpenJDk tests to finish (as it was not ignored).