Open wks opened 3 weeks ago
I tried to log RSS usage during a fop
run with our OpenJDK binding. The large RSS issue is mostly caused by the initialization of SFT and VMMap.
Before MMTk initializes, the RSS is 14M. After MMTk initializes SFT_MAP
, the RSS is 526MB. After MMTk initializes VM_MAP
, the RSS is 783MB. After the plan is created, the RSS is 785MB. After MMTk finishes the initialization, the RSS is 801MB. At the end of the fop
run, the RSS is 900MB+.
Though it is worth investigating how the RSS grows over the run, we should focus on SFT and VM map first. They in total contribute to 80% of the RSS usage.
@qinsoon Can you also evaluated the effect of
and see whether they affect the RSS impact of SFT_MAP
and VM_MAP
, too? I think they affect the heap layout and may also have an effect on the memory SFT_MAP
and VM_MAP
try to mmap.
I was using a compressed pointer build. So it used SFTSparseChunkMap
and Map32
.
Yes. I ran tests a few months ago and I mentioned back then that we've regressed because the restricted address space uses the sparse chunk map. That, VMMap, and not returning pages back to the OS were the largest sources of RSS overhead from memory
Right. Without compressed pointer (using SFTSpaceMap
and Map64
), there is no substantial RSS increase during initialization. At the end of initilaization, the RSS was 17MB. During the fop
run, the RSS increased from 43MB (first GC) to 221MB (last GC).
On the contrary, with compressed pointer (using SFTSparseMap
and Map32
), the RSS was 801MB and during the run, the RSS increased from 833MB (first GC) to 1023MB (last GC).
I compared the mmap entries after running the Liquid benchmark on mmtk-ruby. I used the same binary, and used command line argument to control whether to use MMTk or CRuby's default GC. When using MMTk, the plan is StickyImmix, and the heap size is set to 36M, that is, 1.5x min heap size. I printed the mmap entries and calculated their pages in RAM using the methodology described here. The data is collected at the time of harness_end
. I tried to match mmap entries from /proc/pid/maps
between the two executions, and the results are in the following spreadsheet:
Note that the overhead of SFTMap is trivial because the mmtk-ruby binding is currently using SFTSpaceMap on 64-bit systems, and the tables don't have many entries. The overhead of Map64 is also trivial because the length of its tables (descriptor map, base address and high watermark) is MAX_SPACE
which is only 16.
The mmap entries specific to MMTk includes:
The mmap entries specific to CRuby's own GC includes:
The main mmap entry for malloc (the [heap]
entry plus other un-annotated entries) are:
In summary, MMTk has larger RSS footprint in
In this execution, the MMTk heap size was set to 1.5x min heap size. If we divide the ImmixSpace by 1.5, we get 16MB, which is still larger than CRuby's default GC which is 7MB in size. One possible explanation is that in the current implementation of mmtk-ruby
, we allocate Array, String and MatchData payloads in the GC heap, while the vanilla CRuby allocates them in the malloc heap. That gives us an illusion that MMTk is using more GC heap. But CRuby is just allocating those objects off heap. If we assume half of the heap objects are the payloads of those objects (which is still conservative for the Liquid benchmark I think because it uses regular expression and strings very intensively), it will be 8MB, which is similar to the 7MB of the default GC.
MMTk uses the work packet system. All work packets are allocated in the malloc heap, and the Vec
members of work packets are also allocated in the malloc heap. This can explain the malloc heap usage in GC worker threads. If we reduce the number of GC worker threads by setting the env var MMTK_THREADS
, the RSS footprint related to pthread stacks and thread-local malloc buffers will reduce proportionally. It is arguable that this part of the RSS footprint is not a problem because the memory is not leaked. Rust's ownership mechanism always correctly free unused memory. But the RSS number does look worse than that of the vanilla CRuby.
We observed on several VM bindings that the RSS memory consumption is higher when using MMTk compared to those VMs' default GCs. We need to inspect where those memory is used for. Possibilities include (but are not limited to)
BlockQueue
which caches memory blocks instead of returning the memory to the OS by unmapping.Vec
,Box
, etc.