Design for generator improvements

jpakkane commented 4 years ago

This is basically a fleshed out #3342. Generators seem to not fulfill requirements that people have for them, so let's see if we can make them better. One possible way of making them better is that if you do this:

g = generator(...)
l = g.process(..., unique_id: 'somename')

It would output the files under ${builddir}/meson-gen/somename (and subprojects under the respective subproject dir) instead of the target's private directory. Then when you use the output in multiple targets, they all use the same generated files.

Why like this? Mainly for backwards compatibility. People have projects that use the target private dir for header lookups and suddenly generating the headers somewhere else would break things. The user must specify a unique id for each such generator (using the same name multiple times would be a hard error). The reason for this is that I could not come up with a way to generate a reliable set of state to hash to create a unique id automatically.

This would make it easier to pass outputs of one generator to another, since the paths are always known and static whereas currently we would have to delay specifying the output dir until the result is used in a target.

The UI is a bit crap and unintuitive, so obviously it would need polishing.

germandiagogomez commented 4 years ago

FWIW I myself started using custom_target in what should be a generator in my opinion.

It looks weird to me that a generator generates the same files again and again when different targets use the output. I ended up using custom_target but what I want is a compiler for idl files (capnproto in my case).
Since this is related to generating code improvements, I also had a related issue when generating the file output (but this is in a custom_target, not sure it affects to generator). The issue is here: https://github.com/mesonbuild/meson/issues/6385

bonzini commented 4 years ago

In QEMU I think we have one use case for reusing the unique name. We generate files from many directories, and we would like to have them available from source files in the easiest way. With Makefiles we use basically -I$(@D), so that can have:

chardev/trace.h generated from chardev/meson.build, included as `#include "trace.h" within chardev/
hw/trace.h from hw/meson.build, included as `#include "trace.h" within hw/
and so on (there are about 60 of these)

It would be nice if you could have this as something like

trace_h = generator(...)
include_dir = g.private_dir_include(for_source_dir: true)

and trace_h.process('trace.h', unique_name: 'trace_h') in each directory.

Another possibility is, instead of having the unique_name, to have something like

trace_h = generator(..., name: 'trace_h')

and this would cause all instances of g.process to put the files in trace_h@gen/. But perhaps this is a different use case than what you are thinking about, because then having different unique_names would require different generators too.

bonzini commented 4 years ago

Going back to your original proposal, I wonder if all that is needed is a "custom_target template" (suggesting that it could be created with custom_target_template). This would accept the same arguments as custom_target except input and the initial named argument; in addition, output would also support substitution of @BASENAME@, @PLAINNAME@, and possibly @PRIVATE_DIR@. It would then return a factory object similar to generators, so that later on you could do:

tgt = template_obj.process(name, input: inputs)

with exactly the same effect as invoking tgt = custom_target(...).

germandiagogomez commented 4 years ago

@bonzini just my two cents here and I want to say first that I am not sure I fully understand your use case, but more on a philosophical way of looking at things:

Shouldn't generators be "compilers" to generate a lot of files?
Shouldn't custom_target just be a one-off rule to generate something?

@jpakkane I think we should think of the purpose of each of these functions and why they exist. My best guess of what they should be (not sure if they are intended to be like this):

Generators: generate many files, such as when you compile capnproto or protocol buffers sources to use its output in other targets. Currently I am using custom_target because generators have a private directory per target. Since I have capnproto sources compiled and reused in two targets, this would rebuild the idl files with a generator. With a custom_target it does not.
custom_target: a specific target to be built that is not supposed to be repeatable.

Namely, I tend to see a generator more like getting a bunch of files and processing them and a custom_target as a single rule. I also think (but maybe I am missing some use cases) that when you do this:

foreach something : some_sequence
    custom_target(....)
endforeach

What you probably wanted is a generator and not a bunch of custom_targets. Right now to me this is the part of the API that looks more confusing. I ended up using custom_target but just because of the limitations generator has, not because I wanted to use it in the first place. Are they actually the same thing? Are they fundamentally different?

bonzini commented 4 years ago

I had two proposal that are different:

My first comment was about having something that behaves like a single target when it is used for e.g. private_dir_include, but as you say produces many files. Because it behaves like a single target, you avoid a proliferation of -I flags. However, I would like to have an option of generating those many files by invoking the same program many times from all over the source tree. This would let you work around the "no subdirectory in output file names" limitation.
My second comment was essentially asking what generators currently provide over being just a template for custom targets. However, generator() has backwards-compatibility requirements, hence my proposal of a custom_target_template() that is inspired by generator() but simply produces custom targets.

bjfiedler commented 4 years ago

I think I would profit from a custom_target_generator. Additionally, I'd love to see two points for this new kind of generators:

substitution for the depfile and output parameters which inserts the custom_targets name
evaluate the commandline when the .process() method is invoked instead at the time of the generator definition ore some kind of "currying"

My meson files contain a lot of sections with a specific pattern. I have compressed it here to a MWE. It basically should process some source files with a custom program (in this case cp) generating some intermediate files and then process the intermediate files again with a custom program (in this case sed) to generate the end result (v_targets). In my real use case cp is a clang -emit-llvm and the sed a transpiler (configurable to different variants) that works on the IR code. After that another set of working steps comes: Compile and link the IR code to executables in different architectures and run statistics.

In the MWE I first write the solution with custom_targets (which works, but results in a lot of copy-pasted code) and after that the solution with generators that would be desirable but not possible at the moment.

project('testproj', 'c',
  version : '0.1',
  default_options : ['warning_level=3'])

#### global configurations (done at the main meson file)
## just some dummy data for MWE
flags_x = ['--expression=s/^/x/']
flags_y = ['--expression=s/a/a__y/']
flags_z = ['--expression=s/b/b_z_z_b/']

cp = find_program('cp')
my_generator_cmd = [cp, '@INPUT@', '@OUTPUT@']
my_generatorflags = []

sed = find_program('sed')
other_generator = [sed, '@INPUT@']

## define some generator as generic recipes to invoke cp and sed
## Later, they are specialized with extra flags.

my_generator_gen = generator(cp,
                             arguments:['@INPUT@', '@OUTPUT@']
                                       + my_generatorflags,
                             depfile: '@PLAINNAME@.dep',
                             output: '@PLAINNAME@',
                            )
other_generator_gen = generator(sed,
                                arguments: ['@EXTRA_ARGS', '@INPUT@'],
                                # just no conflict in depfile and
                                # output because generators rebuilds
                                # on each use
                                depfile: '@PLAINNAME@.dep',
                                output: '@PLAINNAME@',
                                capture: true)

#### config dependent settings. Usually this happens inside a subdir() with
#### conditional evaluation based on get_option
## Ideally, I need two subdir runs: One to collect information (set the right flags) and one to define the generators (based on the configuration).
my_generatorflags += ['--preserve']

#### here the actual processing starts

src_files = ['a', 'b', 'c']
variants = ['x', 'y', 'z']

targets = []
g_targets = []
v_targets = []
g_v_targets = []
foreach src: src_files
  name = src + '.gen'

  # first, use a custom_target
  t = custom_target(name,
                    input: src,
                    output: name, # such a thing like @CUSTOM_TARGET_NAME@ would be nice here
                    depfile: name +'.dep', # here, too
                    command: my_generator_cmd + my_generatorflags) # here my_generatorflags resole to '--preserve'
  targets += t

  # now, achieve the same thing with a generator. However, my_generatorflags does _not_ resolve to '--preserve' here.
  g_t = my_generator_gen.process(src)
  g_targets += g_t

  foreach variant: variants
    v_name = name + '.' + variant

    # again, use a custom_target
    v = custom_target(v_name,
                      input: t, # here, a custom_target output is used as (another) custom_target input
                      output: v_name,
                      depfile: v_name + '.dep',
                      capture:true,
                      command: other_generator + get_variable('flags_'+variant))
    v_targets += v

    # Now, again try to use generators instead of custom_targets
    ## not possible since a generator does not accept GeneratedListHolder as input
    # g_v_targets = other_generator_gen.process(g_t, extra_args: get_variable('flags_' + variant))

    ## not possible since generator doesn't accept CustomTargetHolder as input
    ## as reported in #3667
    # g_v_targets = other_generator_gen.process(t, extra_args: get_variable('flags_' + variant))

  endforeach
endforeach

all_variants = custom_target('some evaluation',
                             output: 'all_combined',
                             command: ['cat', '@INPUT@'],
                             # capture: true,
                             build_always_stale: true,
                             input: v_targets+g_v_targets)

Problems I'd like to show:

I have to reapeat name and v_name for the custom_target's name, output and depfile.
- since I have multiple of such (looped custom_target) sections a generator would reduce the amount of copy-pasted code
I can't use generators since they don't accept the previous targets (see comments)
The result generated by the g_t target missed the update of my_generatorflags.
- My problem to not just use extra_args is that I have not only one such flags variable but four that are incrementally updated in various subdirectories. Using extra_args would result in an expression:
```
my_generator_gen.process(src, extra_args: flags1 + flags2 + flags3 + flags4)
```
  which I then have to copy paste again to a bunch of different invocations. Ideally, generators would support some kind of "currying". Something like:
```
my_generator_gen = my_generator_gen.update(extra_args: some_currently_received_flags)
```
  where the @EXTRA_ARGS@ variable is then replaced with some_currently_received_flags + @EXTRA_ARGS@.
- A real world example for this updates is, adding an include path for a library that is conditionally activated in a subdir depending on get_option or something similar.
- Nice would be specifying all variables at the top-level meson files. However, I can't add all the variables there without completely messing it up.
- Another workaround might be to re-specify the generator at each meson file, but that includes copy pasted code, too.

ailin-nemui commented 3 years ago

it would be very nice if this improved generator could also support recording of dependencies that are implicit or passed in some form in the EXTRA_ARGS without having to write a python wrapper script that outputs a depfile. In the code below, meson/ninja is unaware that changing either typemap or any of the explicitly referenced typemap files requires re-running the generator.

https://github.com/irssi/irssi/blob/bf41bfa2f7cb52a44f420441696071eedf233160/src/perl/textui/meson.build#L2-L15

deepbluev7 commented 1 year ago

I have now fallen into the trap of using a generator only to notice that it doesn't allow me to install the results. Example number 1:

assemble_bootrom = generator(rgbasm,
  output  : '@BASENAME@.o',
  arguments : ['-o', '@OUTPUT@', '@INPUT@'])

link_bootrom = generator(rgblink,
  output  : '@BASENAME@.bootrom',
  arguments : ['-o', '@OUTPUT@', '@INPUT@'])

truncate = generator(dd,
  output  : '@BASENAME@.bin',
  arguments : ['count=1', 'of=@OUTPUT@', 'if=@INPUT@', 'bs=@EXTRA_ARGS@'])

gbc_rom_sources = [
  'BootROMs/agb_boot.asm',
  'BootROMs/cgb0_boot.asm',
  'BootROMs/cgb_boot.asm',
  'BootROMs/cgb_boot_fast.asm',
  'BootROMs/mgb_boot.asm',
  ]
gb_rom_sources = [
  'BootROMs/dmg_boot.asm',
  'BootROMs/sgb2_boot.asm',
  'BootROMs/sgb_boot.asm',
  ]

gb_roms = truncate.process(link_bootrom.process(assemble_bootrom.process(gb_rom_sources)), extra_args: '256')
gbc_roms = truncate.process(link_bootrom.process(assemble_bootrom.process(gbc_rom_sources)), extra_args: '2304')

build_target('roms', gb_roms, gbc_roms, build_by_default: true, install....) # does not work

Now that is solvable using custom targets, but I basically need to copy and paste the loop 3 times (one is for the boot logo, which needs assembling later too).

Another example was just generating manpages in multiple directories as well as a summary manpage.

Specifically I always default to trying to define how the specific processing steps work separately from the input files. Generators look to be the much better fit for that than doing multiple for loops with slightly different parameters. However you can't do anything with a generated_list. You can't install it, you can only use it as sources for a normal C compiler. You can't even pass them to a custom target just to copy them. Basically I just want to be able to define my weird compilers, that are usually built as part of the project or from a subproject already, and then use that to define targets.

I ran into this 3 times in the last year alone, I really want ANY solution to this. Bonus points if I can define my own extra arguments, like the final binary size in the above example.

robtaylor commented 11 months ago

Yep, hitting a very similar situation to @deepbluev7 when generating pdfs from rst (chaining rst2latex, pdflatex). @jpakkane did you have any more thoughts around this?

robtaylor commented 11 months ago

to make it a bit more complicated, I then have a html page generation that needs those pdfs as input, and its frustrating that custom_target can't take array[InternalDependency].

mesonbuild / meson

Design for generator improvements #6526