Automatic Meson consistency check

nnamua commented 2 years ago

Description

Starting in spring, I wrote my bachelor thesis. My task was to develop a classification of Meson objects and implement a checking tool that fuzzes Meson automatically to check against this classification.

Motivation for this were bugs like #3469 and #5618 where the user expects something which Meson doesn't provide. Furthermore it should clarify inconsistencies or inaccuracies in the documention. For example the allowed input types of the message() function which are implemented but not documented. My supervisor (@gerion0) has already created a meta issue (#6068) for this.

The result of this work can be found here. This tool was/is able to find the following already reported bugs/feature requests: #1550, #2657, #3175, #3206, #3469, #3667, #4280, #4892, #4893, #4891, #5401, #5706, #5935, #6073, #6476, #6567, #6675, #6702, #6960, #7585, #8039, #8295, #8353, #8373, #8433, #9108, #9175

It also reports a bunch of bugs (Meson crashes with a Python trace in this case). I have reported these bugs in the following issues (grouped by their last line in the Python trace): #9647, #9648 ,#9649, #9650, #9651, #9652, #9653, #9654, #9655, #9656, #9657, #9658, #9659, #9660, #9661, #9662, #9663

Additionally, my tool reports a lot of inconsistencies which I want to report as "feature request". For example one result is that an entity of files() does not have a full_path() function. The following fails:

project('test', 'c')

f = files('foo.c', 'bar.c')

f[0].full_path()

All those "inconsistencies" together are quite a lot. Therefore I have not created issues for that but like to ask how I should report them.

My plan is not only to report those bugs, but to bring the work upstream.

Recently, I discovered this pull request #8960 which seems to solve a very similar problem. The pull request is merged. However, both approaches are somewhat different. Therefore, I want to provide my solution as well. Maybe it can be merged with the other approach to make Meson even better.

The differences (as I experience them) are:

Different formats
- The upstream classificaiton is written in yaml. It must be parsed explicitely and is used for generating a documentation.
- My classification is written in Python. This has the advantage that Python can parse and use the classification on its own. The inheritance relation is retrieved automatically. The tooling is already present.
Different purposes
- The upstream classification should provide a structured and better documentation. As far as I know, it is not used in Meson directly.
- My classification should serve as an input for tools that can find bugs automatically on that base. It could even be used directly in Meson (if wanted) to provide better error messages and a cleaner interface. Of course, it can be used for generating a documentation, too.
Different level of information
- The upstream classification provides a lot of information about the usage of certain parameters. It also provides types and inheritance information.
- My classification does not provide usage information but focuses on all kinds of inheritance information:
  1. Return types can inherit from other super types.
  2. Argument lists can "inherit" from other methods. Some functions accept the same argument names, which were not always consistent. Currently, this seems to be a subset of the information provided by the yaml files.

So to summarize. Both approaches are similar. My approach should automatically retrieve bugs and Python does the parsing, the upstream approach should generate an advanced documentation and the parsing has to be done manually.

For the automatic checking, I need to interact with Meson. I have not found a way for Meson to report object types directly. So my approach for this is:

My tool has a nice structured classification.
The Meson interpreter has an internal (maybe not so structured) classification.
We build test cases on base of the structured classification and check if the interpreter can handle them.

Ideally, both approaches can be combined, so Meson uses the yaml files directly (Or is this already happening?).

Do you think my approach provides an additional benefit? Is it worth to combine both approaches (a bug finding tool that use the yaml files instead of the Python files should be doable)?

eli-schwartz commented 2 years ago

Your fuzzing tool which just generated 17 autogenerated bug reports, has submitted reproduction cases that consist of meson.build samples which use fully entropic variable names, which may be hard to distill into clear (human-readable) reproducers.

In at least one case (#9648) the ticket has 15 comments! And the reproducer is 500 lines long, and it looks like the comments may be too. I simply wouldn't bother to even look at that one.

Avoiding crashes sounds like a nice idea. It would be good to target it in some coordinated way rather than having a raft of issues for each possible route. Perhaps with a tracking meta-bug that links to a collection of test cases in another git repo, which could be run, reproduced, conveniently distilled, etc.... and declare success and close the meta bug once running the collection as s series of unittests, no longer produces any python tracebacks.

I wonder how many of these issues could be solved just by enforcing more consistent typed_kwargs and typed_pos_args, work that has been ongoing for a while now? It's a known problem that originally, all functions just accepted "literally anything" as values, and many still do.

eli-schwartz commented 2 years ago

Now that I come to think about it, the python tracebacks don't come with some really useful information about what happened that would help to report better issue descriptions. So, thank you for inspiring me to submit #9664.

Your 500-line reproducer now reports this traceback:

Traceback (most recent call last):
  File "/home/eschwartz/git/meson/mesonbuild/mesonmain.py", line 138, in run
    return options.run_func(options)
  File "/home/eschwartz/git/meson/mesonbuild/msetup.py", line 294, in run
    app.generate()
  File "/home/eschwartz/git/meson/mesonbuild/msetup.py", line 185, in generate
    self._generate(env)
  File "/home/eschwartz/git/meson/mesonbuild/msetup.py", line 229, in _generate
    intr.run()
  File "/home/eschwartz/git/meson/mesonbuild/interpreter/interpreter.py", line 2492, in run
    super().run()
  File "/home/eschwartz/git/meson/mesonbuild/interpreterbase/interpreterbase.py", line 148, in run
    self.evaluate_codeblock(self.ast, start=1)
  File "/home/eschwartz/git/meson/mesonbuild/interpreterbase/interpreterbase.py", line 173, in evaluate_codeblock
    raise e
  File "/home/eschwartz/git/meson/mesonbuild/interpreterbase/interpreterbase.py", line 166, in evaluate_codeblock
    self.evaluate_statement(cur)
  File "/home/eschwartz/git/meson/mesonbuild/interpreterbase/interpreterbase.py", line 179, in evaluate_statement
    return self.function_call(cur)
  File "/home/eschwartz/git/meson/mesonbuild/interpreterbase/interpreterbase.py", line 454, in function_call
    res = func(node, func_args, kwargs)
  File "/home/eschwartz/git/meson/mesonbuild/interpreterbase/decorators.py", line 639, in wrapped
    return f(*wrapped_args, **wrapped_kwargs)
  File "/home/eschwartz/git/meson/mesonbuild/interpreterbase/decorators.py", line 115, in wrapped
    return f(*wrapped_args, **wrapped_kwargs)
  File "/home/eschwartz/git/meson/mesonbuild/interpreter/interpreter.py", line 1580, in func_shared_module
    return self.build_target(node, args, kwargs, build.SharedModule)
  File "/home/eschwartz/git/meson/mesonbuild/interpreter/interpreter.py", line 2694, in build_target
    target = targetclass(name, self.subdir, self.subproject, for_machine, sources, objs, self.environment, kwargs)
  File "/home/eschwartz/git/meson/mesonbuild/build.py", line 2265, in __init__
    super().__init__(name, subdir, subproject, for_machine, sources, objects, environment, kwargs)
  File "/home/eschwartz/git/meson/mesonbuild/build.py", line 1954, in __init__
    super().__init__(name, subdir, subproject, for_machine, sources, objects, environment, kwargs)
  File "/home/eschwartz/git/meson/mesonbuild/build.py", line 723, in __init__
    self.process_kwargs(kwargs, environment)
  File "/home/eschwartz/git/meson/mesonbuild/build.py", line 2181, in process_kwargs
    self.vs_module_defs = File.from_built_file(path.subdir, path.get_filename())
AttributeError: 'CustomTargetIndex' object has no attribute 'subdir'

meson.build:111:0: ERROR: Unhandled python exception

    This is a Meson bug and should be reported!

The last two lines of text are new, and indicate that no lines past line 111 in that test case are relevant (and that any lines before that line, which aren't variables directly used in that line, are relevant either).

In addition to helping distill your fuzzer's test cases down into something much more useful, this will probably be extremely helpful to users reporting bugs in the wild -- and the reassurance that "This is a Meson bug and should be reported!" will help them feel less terribly confused by what happened, and encourage them to reach out to us. So it's a big win all around.

gerion0 commented 2 years ago

Nice work! I guess the next step is to rerun our tool to generate better bug reports?

In general, the main aspect of our tool is not to find such tracebacks. The main aspect is to find such "consistency bugs" which are a lot more than the reported ones.

eli-schwartz commented 2 years ago

Ideally, both approaches can be combined, so Meson uses the yaml files directly (Or is this already happening?).

I'm not sure it's feasible to have the yaml files be installed as part of meson and parsed by the interpreter to validate functions. If nothing else, that would add a dependency on a yaml parser (which is not in the python stdlib).

What does seem like an extraordinarily useful goal, however, is to add a unittest that parses the yaml and compares it to the python-based classification (typed_* decorators).

If you are interested in helping design a mechanism for that, that would be fantastic. We could then have the CI error out whenever the documentation disagrees with the implementation, for example I noticed in https://github.com/mesonbuild/meson/pull/9515#discussion_r758039594 that run_command(..., capture: true) was never documented when it was added.

It sounds like you already did part of the work by writing the necessary code to extract all the info from typed_* decorators.

eli-schwartz commented 2 years ago

My "better tracebacks" PR is now merged to master, BTW.

gerion0 commented 2 years ago

Ideally, both approaches can be combined, so Meson uses the yaml files directly (Or is this already happening?).

I'm not sure it's feasible to have the yaml files be installed as part of meson and parsed by the interpreter to validate functions. If nothing else, that would add a dependency on a yaml parser (which is not in the python stdlib).

Is it possible to do it the other way around? What I mean is to extend Meson (by an command line argument or with an extra script) so that it generates the yaml files or the documentation out of the KwargInfo objects. I'm not that deep in the Meson code: Do the typed_* decorators support inheritance information?

eli-schwartz commented 2 years ago

I think the bigger problem there is that the yaml docs support extended descriptions, notes, and warnings, which probably do not belong in the source code.

...

The KwargInfo objects just get stored as lists like compiler._COMMON_KWS = [...].

I'm not entirely sure the inheritance information in the yaml means much beyond "this was grouped for convenience", though.

mensinda commented 2 years ago

For the YAML docs, inheritance for functions is mostly done for convenience. However, object level inheritance tries to actually model (more or less successfully) the Meson objects used internally. Additionally, I would recommend that external tools use the generate reference_manual.json file attached to each meson release. The YAML files can also be used, but there are no guarantees on format stability (and there are multiple files compared to only one JSON).

My plan was also to use the YAML docs to at least verify the new @typed_*_args annotations in a unit test (once I get around to implementing it).

bonzini commented 2 years ago

You can use delta to reduce the testcases. It is easy to setup and used routinely by compiler developers.

nnamua commented 2 years ago

It sounds like you already did part of the work by writing the necessary code to extract all the info from typed_* decorators.

Unfortunately no, the typed_* decorators are currently not used for checking types. Many of the decorator usages were added after I had already begun with the project.

bonzini commented 2 years ago

@nnamua regarding the "delta" tool, it will work much better if you make the tool emit each keyword argument on a separate line.

nnamua commented 2 years ago

@bonzini Thanks, I'll have a look and see what I can do!

eli-schwartz commented 2 years ago

Update: I just tested a bunch of the reports, and a number of them are no longer reproducible. In several cases, this is due to the vcs_tag() function no longer crashing on a variety of non-string inputs.

I also found an easy and obvious cause of one of the remaining crashes, and submitted a PR at #10368 to fix it.

Volker-Weissmann commented 1 year ago

Starting in spring, I wrote my bachelor thesis.

Could you post your thesis? I would like to read it.

nnamua commented 1 year ago

Starting in spring, I wrote my bachelor thesis.

Could you post your thesis? I would like to read it.

I can send it to you via email, if you contact me at the public address on my profile.

mesonbuild / meson

Automatic Meson consistency check #9646

Description