google / emboss

Emboss is a tool for generating code that reads and writes binary data structures.
Apache License 2.0
67 stars 20 forks source link

`embossc` unnecessarily spawns 2 subprocesses #122

Closed EricRahm closed 3 months ago

EricRahm commented 3 months ago

As part of a series of end-to-end performance investigations (see #118, #119) I found that the embossc script is spawning a subprocess for the front-end step and then the back-end steps. This adds a small overhead of launching the subprocesses, but moreover requires serializing, piping between processes, and deserializing IR to an intermediate JSON representation.

For a larger emb a PoC showed a reduction of 16% overhead by removing the subprocesses. I'm proposing going from:

  front_end_status = subprocess.run(front_end_args,
                                    stdout=subprocess.PIPE,
                                    env=subprocess_environment)

  if front_end_status.returncode != 0:
    return front_end_status.returncode

  back_end_status = subprocess.run(
    [
      sys.executable,
      os.path.join(base_path, "compiler", "back_end", "cpp",
        "emboss_codegen_cpp.py"),
    ],
    input=front_end_status.stdout,
    stdout=subprocess.PIPE,
    env=subprocess_environment
  )

to something like:

    ir, _, errors = glue.parse_emboss_file(
        flags.input_file[0], emboss_front_end._find_in_dirs_and_read(flags.import_dirs))
    if not errors:
      header, errors = header_generator.generate_header(ir)