google / emboss

Emboss is a tool for generating code that reads and writes binary data structures.
Apache License 2.0
71 stars 21 forks source link

Keep build_info.json up to date and in sync with bazel build #172

Open studgeek opened 1 month ago

studgeek commented 1 month ago

https://github.com/google/emboss/pull/171 introduces a build_info.json file that lists build info for downstream, non-bazel projects.

We should implement a way to ensure build_info.json stays in sync with emboss repo changes.

A few possible approaches to consider (there are probably others): 1) Have the bazel build also use build_info.json. So there is one source of truth. 2) Have the bazel build/tests verify it matches what is in bazel. So bazel build verifies info is correct. 3) Have bazel build generate build_info.json. This can be tricky since it requires the build to change the source files. 4) Have bazel build generate the expected build_info.json and test that it matches (a mix of 3 and 4).

jasongraffius commented 1 month ago

I think this could be tricky because of Bazel's hermetic guarantees, for instance:

So I don't think this can be done solely as a part of a Bazel build.

However, there may be some invocation of bazel query alongside a script to compare/verify similarity. I would worry that such a script could be brittle, though, if it has to make assumptions about the structure of the repo to make the comparison.

jasongraffius commented 1 month ago

Actually, it may be possible to use Bazel aspects for this, I'll have to look into/experiment with this

studgeek commented 1 month ago

My though was some variant of approach 4 should work even with the restrictions you listed. With a lot of hand waving, my thought is there are essentially two steps.

The first is to generate the build info into an output file. Bazel aspects sounds like it could be great for this. Or perhaps an python action that takes the source files and generates the output file. For the latter the source files could be in a variable so the same list is used for normal usage and for generating the output file.

Then a python or C++ test that takes the build_info.json and generated output file as deps and then compares their contents.

jasongraffius commented 1 month ago

Yep yep, the generating the build graph part was the part I was hung up on, but it looks like it is likely doable from aspects

reventlov commented 1 month ago

We do something similar to option 4 for grammar.md: generate_grammar_md.py can be used to generate grammar.md, and docs_are_up_to_date_test.py checks that the generated file matches the checked-in file. I'm half planning to do something similar for the LR(1) parser tables, which take a noticeable amount of time to build at compiler startup.

I think that option 1 is out for the reasons Jason listed, option 2 is possible, but requires manual work, and option 3 is not directly possible because Bazel won't generate files into the source tree.

There is an option 5: have something parse the BUILD files outside of Bazel, and option 6: have a new top-level source of truth, from which the BUILD files are generated (with a test to ensure that the checked-in BUILD files match the new files).

EricRahm commented 2 weeks ago

6) is probably more in line with what boringssl does, I'm kind of leaning towards that. Essentially it takes a top-level build.json file, runs it through a script that cleans it up and outputs some files to a gen/ directory that has a sources.{json,bzl,cmake,gni}. This is run outside of it's bazel build. The BUILD.bazel file then has access to the generated source list via load(":gen/sources.bzl",...)