mesonbuild / meson

The Meson Build System
http://mesonbuild.com
Apache License 2.0
5.57k stars 1.62k forks source link

custom_target: Ambiguous target when input and output is the same file #6070

Open gerion0 opened 5 years ago

gerion0 commented 5 years ago

Some commands (in my case runlib, other examples: sed -i, strip) process a file just changing an existing file without writing to a new output file. Some of them allow to specify writing in a new file, others do not.

In the latter case custom_target is not usable anymore because the filename/target is ambiguous. The dirty workaround is to provide a script or call bash -c that does an additional mv afterwards. However, can this be done in meson?

Possible solutions:

  1. Provide an fake_output argument and when set meson performs an additional mv.
  2. Provide the ability to perform multiple commands in a chain within the same target, so the additional runlib target can be skipped entirely.
marc-h38 commented 5 years ago

The dirty workaround is to provide a script or call bash -c that does an additional mv afterwards.

I wouldn't call the move workaround "dirty" because it's trying to do what should have happened in the first place :-)

To be bullet-proof I'm afraid you need not just one move but two... Let's say there's a failure somewhere near the in-place operation. You fix the problem and run "ninja" again. Now how do you expect any build system to know whether ranlib has already be run on its argument? It can't, so you must clean and build from scratch.

As long as this is a single step from Meson's perspective, completely hiding the ranlib argument with two moves should be bullet-proof:

cp no_index_yet.a tmp.a
ranlib tmp.a
mv tmp.a with_index.a

However, can this be done in meson?

+1

Provide the ability to perform multiple commands in a chain within the same target, so the additional runlib target can be skipped entirely.

+1

I assume you meant combining ar and ranlib (instead of runlib) in a single step. Should work as long as you don't mind having to "ninja clean" if things go horribly wrong. In fact a number of ar tools have the ranlib feature built-in which avoids this issue completely!

sed -i is not a good example: just don't use -i

In GNU binutils, objcopy does everything strip does and not in place (and more).

Modifying files in place is just the worst idea from an automated build perspective, hopefully just something from an other age.

gerion0 commented 5 years ago

The dirty workaround is to provide a script or call bash -c that does an additional mv afterwards.

I wouldn't call the move workaround "dirty" because it's trying to do what should have happened in the first place :-)

I call it dirty because you have to use another tool (bash in this case) to chain your actual actions. Maybe bash is not present in the system. The most correct way is propably to use Python since Meson runs with Python. Of course the syntax is really cumbersome then.

As long as this is a single step from Meson's perspective, completely hiding the ranlib argument with two moves should be bullet-proof:

cp no_index_yet.a tmp.a
ranlib tmp.a
mv tmp.a with_index.a

Thanks, nice catch.

I assume you meant combining ar and ranlib (instead of runlib) in a single step.

Eh, yes.

sed -i is not a good example: just don't use -i

Partly. Without -i it writes to stdout and not another file. It is only usable because Meson has an extra flag for exactly that case.

Modifying files in place is just the worst idea from an automated build perspective, hopefully just something from an other age.

I completely agree with you. Nevertheless, such tools exist.

marc-h38 commented 4 years ago

Interesting example I just root-caused:

# Without a dummy output, objcopy re-creates a brand new _input_ file
# even when it hasn't changed. On filesystems with high-precision
# timestamps like APFS this means the build never converges which is
# confusing.
objcopy  --dump-section  .some.section=real_output  input   .dummy_dump_section_output
marc-hb commented 4 years ago

I assume you meant combining ar and ranlib (instead of runlib) in a single step.

BTW thanks for this typo, it's perfect: https://github.com/mesonbuild/meson/issues?q=runlib :-)

xclaesse commented 4 years ago

I think we should allow custom_target() with no output that internally creates a dummy file. This would basically be the same as run_target() but that can be used as dependency. I've been needing that elsewhere too, would that fix the problem here?

marc-h38 commented 4 years ago

It would except when the user wants to control the output name.

It would also be confusing: some non-experts would likely assume the change is performed in place when reading the code.

marc-h38 commented 4 years ago

After thinking a bit more, I think this is the sort of user interface I would like best:

my_stripped_exes = custom_target('stripped exes',
  output : [ 'fatstripped.exe', 'fatterstripped', ]
  input : [ 'fat.exe', 'fatter', ]
  (command_is_)inplace : true, # defaults to false
  command : ['strip', ... 
   ...

It would behave like this for each input:

cp fat.exe $tmpfile
strip $tmpfile
# most filesystems guarantee atomic moves
mv $tmpfile stripped.exe 

Thumb this comment up if you like this too.

UPDATE/PS: interestingly, meson already has a "soft" dependency on strip:

Build targets in project: 42
Found ninja-1.9.0 at /usr/local/bin/ninja
WARNING: Cross file does not specify strip binary, result will not be stripped.
WARNING: Cross file does not specify strip binary, result will not be stripped.
...
xclaesse commented 4 years ago

I would call command_is_inplace to simply inplace. That would also imply that there can be only one input and output. Maybe we could have instead inplace_input and inplace_output where both lists must have same length so inplace_input[i] will be renamed to inplace_output[i]? Dunno if there is a use-case for multiple files.

marc-h38 commented 4 years ago

I would call command_is_inplace to simply inplace.

Thanks, I edited the comment.

That would also imply that there can be only one input and output. Dunno if there is a use-case for multiple files.

Good catch, thanks! Of course there is; most examples above do accept multiple files. I made another (naive?) edition.

Maybe we could have instead inplace_input and inplace_output

I kept the existing input and output names on purpose because the proposition above is to correct and hide as much as possible the "bad" inplace behavior (bad from an incremental build perspective) and make the inputs and outputs behave exactly the command was not inplace_

xclaesse commented 4 years ago

I guess there is no use-case where some inputs are inplace and some are not?

In your example above, we have to assume that input[i] maps to output[i], right? I find that a bit "fragile" but that's probably good enough.

Just thowing (crazy) idea, what about having a dictionary:

my_stripped_exes = custom_target('stripped exes',
  input : ['some_other_file'], # optional
  output : ['some_other_output'], # optional
  inplace : {'fat1.exe': 'stripped1.exe', 'fat2.exe': 'stripped2.exe'},
  command : ['strip', ... ],
)

In that proposal, inplace: would be a dictionary where keys are added to inputs and values are added to outputs, and it does the trick replacing them with tmpfile when running the command. input: and output: can still contain normal files that are not modified inplace.

marc-h38 commented 4 years ago

I guess there is no use-case where some inputs are inplace and some are not?

If there is anything (that... crazy), I don't think meson should specialise so much for it. A wrapper should be inserted and present something saner to meson = the usual layer of indirection "fix".

In your example above, we have to assume that input[i] maps to output[i], right?

Yes.

I find that a bit "fragile" but that's probably good enough.

Right, you could have a very long list and say skip a beat by mistake. You'd have to skip the same number of beats in both lists though. After you do, I think the next build steps would most likely fail in some spectacular and obvious way.

inplace : {'fat1.exe': 'stripped1.exe', 'fat2.exe': 'stripped2.exe'},

Assuming it's mutually exclusive with regular input, output, I like it too! Also glad we're "bikeshedding" the user interface, it implies we agree on the logic itself :-)

Somewhat related and fixed in 0.53: Can't use an expression as a dict key? #5231

xclaesse commented 4 years ago

I find that a bit "fragile" but that's probably good enough.

Right, you could have a very long list and say skip a beat by mistake. You'd have to skip the same number of beats in both lists though. After you do, I think the next build steps would most likely fail in some spectacular and obvious way.

I bit scary that both lists must be sync and could be hard to discover if 2 elements are just swapped in one list but not the other. But again in practice it's probably not that bad.

inplace : {'fat1.exe': 'stripped1.exe', 'fat2.exe': 'stripped2.exe'},

Assuming it's mutually exclusive with regular input, output, I like it too!

Yes, having a key with same value than one of the input (or value/output) would be an error.

Also glad we're "bikeshedding" the user interface, it implies we agree on the logic itself :-)

In the end it's still @jpakkane who needs to approve API changes, but as far as I'm concerned the use-case seems legit to me.

marc-h38 commented 4 years ago

as far as I'm concerned the use-case seems legit to me.

BTW: https://github.com/mesonbuild/meson/issues?q=is%3Aopen+sort%3Areactions-%2B1