mmtk / mmtk-core

Memory Management ToolKit
https://www.mmtk.io
Other
374 stars 68 forks source link

Remove Java-specific constants #922

Open wks opened 1 year ago

wks commented 1 year ago

TL;DR: src/util/constants.rs defines many Java-specific constants, such as BYTES_IN_INT and BYTES_IN_LONG. Many other places in mmtk-core use those constants. We should replace them with Rust counterparts, or more meaningful constants.

Problem

MMTk was ported from JikesRVM written in Java, and Java integer types have precisely defined sizes. The Java MMTk had constants that represent the sizes of those built-in integer types. They are: {,LOG_}{BITS,BYTES}_IN_{CHAR,SHORT,INT,LONG} and {MAX,MIN}_INT.

When porting to Rust, we copied those constants over. But those constants are foreign for Rust users because there is no short, int or long in Rust, but there are i16, i32 and i64 instead. Moreover, as a language-agnostic GC framework, MMTk should not be Java-specific. So it doesn't make sense to use those Java-specific constants (i.e. to depend on Java's semantics of short, int and long, etc.).

Which parts are using those constants

We can let the compiler tell us which parts of mmtk-core are using those constants by adding #[deprecated] to those constants.

src/plan/generational/barrier.rs

In memory_region_copy_slow, there is an assertion

            debug_assert_eq!(
                dst.bytes() & (BYTES_IN_INT - 1),
                0,
                "bytes should be a multiple of 32-bit words"
            );

It was BYTES_IN_ADDRESS, but was changed to BYTES_IN_INT in https://github.com/mmtk/mmtk-core/pull/899. It may be related to CompressedOops.

src/plan/plan_constraints.rs

In PlanConstraints::default(), max_non_los_default_alloc_bytes and max_non_los_copy_bytes are set to MAX_INT. They are both thresholds for deciding whether objects should go to LOS. There are separate values for "alloc" and "copy" because when copying, object size may grow (for example, for implementing address-based hashing).

I think it is a mistake. MAX_INT is not a meaningful value for anything

src/policy/marksweepspace/malloc_ms/global.rs

The constant MAX_OBJECT_SIZE is defined as MAX_INT, but MAX_OBJECT_SIZE is unused.

src/util/alloc/allocator.rs

The function fill_alignment_gap fills alignment gaps one VM::ALIGNMENT_VALUE at a time. But VM::ALIGNMENT_VALUE is a usize. This is obvious a bug because that will cause a usize value to be stored at unaligned addresses, which has undefined behaviour.

src/util/heap/space_descriptor.rs

The constant BASE_EXPONENT is defined as BITS_IN_INT - MANTISSA_BITS. It is used in create_descriptor_from_heap_range to perform bit shifting, and is only used on 32-bit systems. I think it should be u32::BITS.

src/util/raw_memory_freelist.rs

The constant LOG_ENTRY_BITS is defined as LOG_BITS_IN_INT. The freelist data structure from JikesRVM uses 32-bit entries to form a linked list. It should be u32::BITS.ilog2().

src/vm/mod.rs

The constants DEFAULT_LOG_{MIN,MAX}_ALIGNMENT are defined as LOG_BYTES_IN_{INT,LONG}, respectively. They are just defaults. I think it should be OK to replace them with u32::BITS.ilog2() and u64::BITS.ilog2().

qinsoon commented 1 year ago

Related PR about plan constraints: https://github.com/mmtk/mmtk-core/issues/588

qinsoon commented 1 year ago

I think it is a mistake. MAX_INT is not a meaningful value for anything

  • If it is intended to express "no limit" in the sense that the default space can hold objects of any sizes, it should have been BYTES_IN_ADDRESS.
  • If it is giving a reasonable default for plans, I don't think MAX_INT is a good default value because a 2GB object is too big for almost anything other than LOS. A reasonable value should be something like "4 * block_size".

It means no limit. The code was just ported from Java MMTk.

qinsoon commented 10 months ago

https://github.com/mmtk/mmtk-core/pull/1026 made Java specific constants pub(crate). So no binding is using those constants right now. We can do refactoring within mmtk-core, and remove the constants.