qinsoon commented 9 months ago

This PR introduces different kinds of nursery size options, and by default, we use a proportion of the heap size as the min nursery. This PR should generally improve the generational plans' performance by triggering a full heap GC more promptly. This PR mitigates the issue identified in https://github.com/mmtk/mmtk-core/issues/594, but does not fully fix the problem.

qinsoon commented 9 months ago

/// Return upper bound of the nursery size (in number of bytes)
    pub fn get_max_nursery_bytes(&self) -> usize {
        self.nursery.max.unwrap_or_else(|| {
            if !vm_layout().force_use_contiguous_spaces {
                DEFAULT_MAX_NURSERY_32
            } else {
                DEFAULT_MAX_NURSERY
            }
        })
    }

This check is removed. I am not sure about the consequences. Maybe I should change the default value for max nursery, depending on the VM layout.

qinsoon commented 9 months ago

I quickly tried h2o, for which generational Immix plans were reported slower than full heap Immix in https://github.com/mmtk/mmtk-core/issues/594#issuecomment-1913851147. By changing the min nursery size, generational immix can perform better than Immix. I will do more thorough performance evaluation before making this PR ready for review.

qinsoon commented 8 months ago

The performance results for now (using 2.5x minheap from temurin-21-G1-n-5):

GenImmix

Normalized to Immix/master genix plotty

Normalized to GenImmix/master genix-normalized-to-master plotty

StickyImmix

Normalized to Immix/master stickyix plotty

Normalized to StickyImmix/master stickyix-normalized-to-master plotty

Issues I will look into

jython and tomcat shows slowdown for GenImmix
h2, jython and lusearch show slowdown for StickyImmix

Issues I am not going to look into

Gen immix and sticky immix is slower than immix (the two issues above may contribute to this issue. But if those are resolved and gen/sticky immix are still slower, I do not plan to look further into the issues.)
Immix and sticky immix ran into OOM for tomcat.
Immix ran into OOM for h2.

k-sareen commented 8 months ago

Hmm. {Gen,Sticky}Immix are still really bad in comparison to Immix for lusearch. That doesn't make any sense. Looking back at my old results, it seems like yeah, that was the case back then as well.

qinsoon commented 7 months ago

I fixed a few mistakes based on the previous evaluation. The following is the new results. I ran with 2.5x of the G1 heap size (temurin-21-G1-n-5) with 5 iterations.

The results show that the PR is an improvement for both GenImmix and StickyImmix in most benchmarks, 8% improvement for GenImmix and 3% improvement for StickyImmix (using 0.25x heap size as the min nursery). However, both plans are still performing worse than Immix (2% and 1.5% slower, respectively). There are still other performance issues with our generational plans that should be looked into.

The results also show slowdown in tomcat. The G1 min heap for tomcat is 19M, and the min heap for MMTk is 42M (GenImmix) and 59M (Immix) in my measurement. So with 2.5x of the G1 min heap, we are almost running at the min heap for Immix plans. With a proportion of the heap size as the min nursery, we always trigger full heap GCs for tomcat, while with a fixed 2M min nursery (as in master), we may occasionally do nursery GCs. However, this is a separate issue about MMTk's min heap with OpenJDK.

GenImmix

min-nursery-genix

plotty

StickyImmix

min-nursery-stickyix

plotty

qinsoon commented 7 months ago

LGTM. But we should make an issue for resolving the virtual memory issue. Also did you get the performance results in the end?

I am running the benchmarks. I will post the results once I got it.

qinsoon commented 7 months ago

GenImmix

min-nursery-genix-fix plotty

StickyImmix

min-nursery-stickyix-fix plotty

The improvement for GenImmix is 9.5% and the improvement for Stickyimmix is 3.1% (compared to the 8% and 3% improvement in the previous evaluation). However, one issue is that h2 panics in the run, as it sets the heap size as 1700M, and we cannot guarantee another 1700M virtual memory for the 100% max nursery.

If the nursery size is set explicitly, I think panicking may be a good idea when we cannot satisfy the virtual memory for the nursery. But when we use the default nursery size, we would expect it run, rather than panicking.

Performance-wise, I think this PR achieved what we want.

wks commented 7 months ago

... But when we use the default nursery size, we would expect it run, rather than panicking.

Yes. If the user doesn't specify a nursery size, we shouldn't blame the user for providing a bad value. One way to solve this is to provide another option "force_nursery_size" which defaults to false. When true, it will panic if the give nursery size cannot be satisfied; when false, MMTk core will adjust the size to the nearest possible value.

qinsoon commented 7 months ago

... But when we use the default nursery size, we would expect it run, rather than panicking.

Yes. If the user doesn't specify a nursery size, we shouldn't blame the user for providing a bad value. One way to solve this is to provide another option "force_nursery_size" which defaults to false. When true, it will panic if the give nursery size cannot be satisfied; when false, MMTk core will adjust the size to the nearest possible value.

I am thinking about adding NurserySize::Adaptive and use that as the default value. It tries to use [25%, 100%) heap size as nursery, and when it is not possible, it will be adaptive and use whatever that works (e.g. use a smaller extent for nursery virtual memory, or use discontiguous nursery).

k-sareen commented 7 months ago

The performance looks good (better than master for most things) but still not much better than Immix which is odd. But this PR does not need to solve all the issues at once.

I agree with @wks and your Adaptive idea.

qinsoon commented 7 months ago

We checked with Steve. The reason that nursery was using a contiguous space is that we can do range check for mature objects in barriers. However, this is not important for now. I changed the code to use discontiguous nursery. I am running another round of evaluation.

qinsoon commented 7 months ago

The results with a discontiguous nursery.

GenImmix

min-nursery-genix-discontig plotty

StickyImmix

min-nursery-stickyix-discontig plotty

The results are pretty similar to the previous run (with contiguous nursery), but slower. The improvement is 8.7% for GenImmix and 2.4% for StickyImmix, compared with 9.5% and 3.1% in the previous run. There is a consistent 0.8%/0.7% difference.

The PR currently uses discontiguous nursery. But I can change it to contiguous nursery, and only use discontiguous nursery if we may run out of virtual address. Just let me know if that's desired.

qinsoon commented 7 months ago

@wks @k-sareen Can you review this PR again if you think it is necessary? I would like to get issues resolved for this PR, and get it merged.

mmtk / mmtk-core

Allow nursery size to be proportional to the heap size #1087

GenImmix

StickyImmix

Issues I will look into

Issues I am not going to look into

GenImmix

StickyImmix

GenImmix

StickyImmix

GenImmix

StickyImmix