mmtk / mmtk-core

Memory Management ToolKit
https://www.mmtk.io
Other
352 stars 67 forks source link

Fix `ci-common.sh` to parse features in a more robust way #856

Open wks opened 1 year ago

wks commented 1 year ago

Currently, ci-common.sh gets a list of features of mmtk-core by grepping the Cargo.toml file as text. This method is brittle w.r.t. formatting. For example, if I define a feature dependency with comment, like this:

vo_bit = [
    # VO bits for dead objects must have been cleared by the end of each GC.
    # Native MarkSweep only ensures this in eager sweeping mode.
    "eager_sweeping"
]

The current ci-common.sh will consider all lines not starting with # (including comment lines that have indents) as features. As a reault, it will attempt to execute the following command in the command line:

cargo build --features vm_space,ro_space,code_space,vo_bit,# VO bits for dead objects must have been cleared by the end of each GC.,# Native MarkSweep only ensures this in eager sweeping 'mode.,"eager_sweeping",],is_mmtk_object,object_pinning,immix_non_moving,immix_smaller_block,immix_zero_on_release,sanity,analysis,nogc_lock_free,nogc_no_zeroing,single_worker,extreme_assertions,nogc_multi_space,work_packet_stats,malloc_counted_size'

which results in error:

error: Found argument 'VO' which wasn't expected, or isn't valid in this context

As a workaround, we can rewrite such comments. But I think we should parse Cargo.toml as TOML (or let cargo metadata to output JSON instead).

Using the command line tool jq, the following line

cargo metadata --format-version 1 | jq '.packages[] | select(.name? == "mmtk") | .features | keys | .[]' -r

prints a list of features:

analysis
code_space
default
eager_sweeping
extreme_assertions
immix_non_moving
immix_smaller_block
immix_zero_on_release
is_mmtk_object
jemalloc-sys
malloc_counted_size
malloc_jemalloc
malloc_mark_sweep
malloc_mimalloc
malloc_native_mimalloc
mimalloc-sys
nogc_lock_free
nogc_multi_space
nogc_no_zeroing
object_pinning
perf_counter
pfm
ro_space
sanity
single_worker
vm_space
vo_bit
work_packet_stats

And if we add the following section to Cargo.toml:

# Cargo doesn't care what we put into package.metadata
[package.metadata.mutex_groups]
malloc = ["malloc_mimalloc", "malloc_jemalloc", "malloc_native_mimalloc"]
marksweepallocation = ["eager_sweeping", "malloc_mark_sweep"]

Then the following command

cargo metadata --format-version 1 | jq '.packages[] | select(.name? == "mmtk") | .metadata.mutex_groups'

can print out the mutural exclusive groups in a JSON format:

{
  "malloc": [
    "malloc_mimalloc",
    "malloc_jemalloc",
    "malloc_native_mimalloc"
  ],
  "marksweepallocation": [
    "eager_sweeping",
    "malloc_mark_sweep"
  ]
}

and the following can parse it into a CSV:

$ cargo metadata --format-version 1 | jq '.packages[] | select(.name? == "mmtk") | .metadata.mutex_groups | .[] | @csv' -r
"malloc_mimalloc","malloc_jemalloc","malloc_native_mimalloc"
"eager_sweeping","malloc_mark_sweep"

Of course we can always use a Python script to parse JSON and generate all combinations of possible feature sets.

wks commented 11 months ago

The PR https://github.com/mmtk/mmtk-core/pull/916 introduced a Python script to edit Cargo.toml to override the mmtk dependency, and it works.

The Python script uses the tomlkit module for modifying TOML while preserving its formatting and comments (although it is unnecessary because the testing is one-shot and automated). tomlkit is in the apt-get repo of Ubuntu 22.04 (python3-tomlkit) so we can install it as needed in the CI scripts.

If we only need to read Cargo.toml, we can use the built-in tomllib module introduced since Python 3.11 because the built-in tomllib module is read-only -- it cannot write TOML.

An alternative to writing a custom Python script is using command-line utilities to query TOML contents. jq does not support TOML natively. A recent version of fq supports TOML, but the version of fq in the apt-get repo of Ubuntu 22.04 is too old. The yq utility is a wrapper of jq and it supports TOML. It is not part of the apt-get repo of Ubuntu 22.04, but it can be installed via pip.