Closed qinsoon closed 10 months ago
Did this happen in CI? We can set RUST_BACKTRACE
to 1 when running on the CI so that we can always collect the stack trace.
Yeah. In my PR https://github.com/mmtk/mmtk-openjdk/pull/263, I changed the heap size used in CI to a reasonable value (for https://github.com/mmtk/mmtk-core/issues/424), and started to see this error. I did not look into the problem though.
This is 100% reproducible on my machine. I'll have a look at it.
The culprit is the check current_chunk != self.common.start
:
impl<VM: VMBinding> CopySpace<VM> {
// ...
#[cfg(feature = "vo_bit")]
unsafe fn reset_vo_bit(&self) {
let current_chunk = self.pr.get_current_chunk();
if self.common.contiguous {
// If we have allocated something into this space, we need to clear its VO bit.
if current_chunk != self.common.start { // ERROR!
crate::util::metadata::vo_bit::bzero_vo_bit(
self.common.start,
current_chunk + BYTES_IN_CHUNK - self.common.start,
);
}
} else {
for (start, size) in self.pr.iterate_allocated_regions() {
crate::util::metadata::vo_bit::bzero_vo_bit(start, size);
}
}
}
// ...
}
It is intended to omit clearing VO bits for the nursery if no mutators allocated anything into the nursery between the last and the current GC. However, it does the check at chunk granularity. If mutators allocated less than one chunk of memory into the nursery, current_chunk
will still be equal to self.common.start
, and it will not clear any VO bits, even though some objects have been allocated.
One obvious fix is removing that if
statement so that we unconditionally clear the VO bits of the entire nursery after every GC.
Another obvious fix is clearing from self.common.start
to self.pr.cursor()
because the cursor is the end (at page granularity) of allocation.
The third obvious fix is just using self.pr.iterate_allocated_regions()
regardless whether the space is contiguous or not. For contiguous monotonic page resources, the iterator always yields one single range from pr.start
to pr.cursor
(cursor is aligned up to a multiple of chunks, so if pr.start == pr.cursor
, it will be a range of 0 bytes.
I think we can just use self.pr.iterate_allocated_regions()
as it is the simplest solution that works.
Build OpenJDK with
fastdebug
andVO_BIT=1
.Run dacapo 2006 with GenImmix.
We will see.