sosy-lab / benchexec

BenchExec: A Framework for Reliable Benchmarking and Resource Measurement
Apache License 2.0
227 stars 192 forks source link

User-Defined Extra Information about a Benchmark Run #524

Open dbeyer opened 4 years ago

dbeyer commented 4 years ago

It would be a great feature to be able to include some information in the benchmarking results that BenchExec itself is not able to collect.

For example (motivation comes from competition execution):

I envision a command-line option for benchexec that collects a string from the user and puts it into a new tag in the results XML.

Then, table-generator, in cases were this tag is filled, prints a row (e.g., below "Date of execution") in the "Benchmark Setup" table.

This would be extremely useful to support easy replicability.

dbeyer commented 4 years ago

Related to issue #418; makes that issue less important.

PhilippWendler commented 4 years ago

BenchExec already has a lots of ways for users to specify names and influence what is shown in tables:

  1. The name of the benchmark-definition file (will influence name of result files, and be shown in row "Run set" if run definition has no name).
  2. The value specified with benchexec --name (will be appended to 1. and appears in the same places).
  3. The value specified in the displayName attribute of the benchmark definition (will be shown in an extra row "Benchmark").
  4. The value specified in the name attribute of the rundefinition tag (will influence name of result files and be shown in row "Run set").
  5. The value specified in the name attribute of the tasks tag (will influence name of result files and be shown in row "Run set" if the table is created only for one tasks tag).
  6. The name of the table-definition file (will influence name of table files and be used as the title of the HTML tables).
  7. The value specified with table-generator --name (will override 6.).
  8. The name of the title attribute of the union tag (will be shown in row "Run set" overriding other values there).

So, what is suggested (a way for users to add a completely new row with user-defined content to the "Benchmark Setup" table) already exists with item 3.

dbeyer commented 4 years ago

The proposal is not about adding names, but information of a different kind.

item (3) is meant to show the name of the column (well, its name already suggests a semantics of how it should be used). All other items are also definitely not useful for the kind of information that is described in the proposal; those other items are mainly about ids, file names, and names for something.

dbeyer commented 4 years ago

Please note that the proposal has two components:

  1. Include extra information in the benchmark results XML file.
  2. Show this information appropriately in the results tables produced by table-generator.
PhilippWendler commented 4 years ago

Ok, so if all of the above items (and in particular, item 3) are too specific for the intended use case, it seems we should find a general solution that covers all kinds of information that users might want to add. After all, we don't want to add one field for one use case now, another one for a different use case next year, etc. (https://github.com/sosy-lab/benchexec/pull/526#issuecomment-557783553 already mentions that we cannot predict what other users might want to add).

Name

For the name of the new field, a first idea would be description, which should be quite general and cover all kinds of use cases and fit well together with name, which we already use.

The issue title uses "user-defined extra information", maybe the name of the field should be inspired from that? But info(rmation) seems too general, because everything in the results XML file is information. Combinations like userinfo also do not seem to offer any advantage compared to description.

provenance as suggested in #526 seems too specific for one use case.

https://github.com/sosy-lab/benchexec/pull/526#issue-344793331 uses "lab log-book entry" as an example, but I also do not see any good names related to that.

Or do we need an even more general solution where users can add several fields with different names, as arbitrary key-value pairs? I would hope that this complexity is not necessary, because what could happen then is that users define fields with names that are also used by BenchExec itself, and this could be confusing. Users should be able to put everything into one field if we make it general enough, right?

Format

What would BenchExec expect as format of the new information? Plain text? Would it need to preserve line breaks and other whitespace verbatim? If yes, it would mean we need to check how to define the XML tag properly such that the parser does not strip this kind of information (and how to avoid our XML pretty printer from reformatting it). But I guess that if the use case is something like https://github.com/sosy-lab/benchexec/pull/526#issuecomment-557783553 it does not look well in the table if we do not preserve at least line breaks?

Of course we could also consider richer formats like Markdown, but this would complicate the handling in table-generator. And I guess it would be easy to add support for this in the future in a backwards-compatible way by adding a format attribute to the XML tag with the description.

Additional Data

In https://github.com/sosy-lab/benchexec/pull/526#issuecomment-557783553 it was suggested that benchexec could add its own version. However, it would be skeptical of combining raw text from the user (with arbitrary formatting, content, and even language) with any fixed string generated by us.

So if we really want to do this, this might mean we need to use some structured format, but we already have the result XML as a structured format, and it seems weird to have another structured document nested inside an XML document. Furthermore, information like BenchExec's version is already present in the XML, so it would actually be redundant.

As alternative, we could consider replacing variables like ${benchexec_version} in the text. This would probably not have any disadvantages except for the implementation effort, but I wonder whether it is worth it, given that all information that would be accessible via variables is (or should) be present as attributes in the XML anyway.

Source

How should benchexec get the information from the user? For multi-line text (assuming this is the result of the format question) it is possible to pass it as command-line parameter (with appropriate quoting), but this is rather uncommon and maybe impractical.

Should benchexec open an editor window and let the user enter something, the way that git commit does? Might be nice in some cases, but would not work for use cases where benchexec is used noninteractively, so we need another solution in any case (and this seems a little bit complex to properly implement, so we should do it in the future, if at all).

Should benchexec read the information from a file? This might be the easiest choice for now, but I fear that this means that users in practice would often forget to update that file, and thus wrong information is attached to to the benchmark results, which would be worse than no information. For example, if one does git pull in the repo with the benchmarks, who would remember to update the text file with the benchmark description that happens to lie around somewhere? BenchExec maybe shouldn't encourage error-prone workflows like this.

If we do use a file as input, should the name of the file be given on the command line or somewhere in the benchmark definition? Or should even the whole text be part of the benchmark definition and copied from there? I guess this could be convenient for some use cases, but would not work well for things like the version of the benchmark set.

Any other suggestions for how to get the content?

This also raises another question: If we use for example a text file as input, what would actually be the concrete advantage of inserting that file into the results XML file? The latter file is already not fully self-contained: the tool logs and output files are stored in separate files, so for redistributing benchmark results one always needs to distribute several files (2 at least, usually more). If the user-provided description is in another file, this shouldn't complicate things compared to the current situation, right? The feature of adding user-provided extra information to the tables could still be offered by table-generator, it would just read it from a different file than from the results XML file. This would eliminate lots of the above questions, simplify the implementation, and be easier to extend in the future.

<union>

If the description is indeed added to the results XML file, there can be the case where several different description texts are available for one (group of) columns in the final table: if a <union> tag is used in the table definition to group several result XML files. How should table-generator handle this? Concatenate the descriptions? This could produce very long texts and blow up the size of the table on the summary tab. Keeping only one text and discarding the other is obviously not good. Maybe not show any text at all in this case?

dbeyer commented 4 years ago

Name: Perhaps user- prefixed, to transport the meaning of: "if it is wrong, it's not BenchExec's fault".

Tag vs. Attribute: I think it should be a tag, not an attribute of result.

General vs. Specific: I do not see any other use cases besides the provenance currently. How about not predicting future potential use cases and just support the one important use case that we have?

Format: Plain text is sufficient.

BenchExec's own info: Right, BenchExec should put it's own information elsewhere, e.g., in attributes of tag result. Or omit for now.

Source: I would prefer and find it most convenient if BenchExec read's the info from a text file. I would give the file name on the command line. But I guess I can also write something like benchexec ... -provenance-info "echo benchmark-info.txt" (not sure about the quoting right now).

The provenance info really belongs to the results and whereever the results XML file is transported, it should be accompanied by the provenance info.

I am not sure yet how table-generator should show the info. Perhaps a tool info for the run set. Yes, perhaps the concatenation. But more important is to have the info in the results XML for now.

dbeyer commented 4 years ago

I used "User-Defined Extra Information" in the title of the issue. But this was perhaps bad. I also tend to try to generalize. But I was really only thinking about the provenance info all the time. https://github.com/sosy-lab/benchexec/pull/526#issuecomment-557783553

PhilippWendler commented 4 years ago

Name: Perhaps user- prefixed, to transport the meaning of: "if it is wrong, it's not BenchExec's fault".

That would be another argument for using description: it already sounds like something manually written, not auto-generated by BenchExec.

user-... would be ok for the XML tag, but there the naming is not that important because people who look at the raw XML files should read the official documentation on their semantics anyway. The important part of the naming is under what label the HTML tables present this to the user, and there I would find combinations like User Description weird.

General vs. Specific: I do not see any other use cases besides the provenance currently. How about not predicting future potential use cases and just support the one important use case that we have?

I prefer to not hardcode specific use cases in BenchExec if we have a general solution (naming it description) that is not more effort and works equally well.

Format: Plain text is sufficient.

I assume this means that line breaks and whitespace should be preserved verbatim.

Source: I would prefer and find it most convenient if BenchExec read's the info from a text file. I would give the file name on the command line. But I guess I can also write something like benchexec ... -provenance-info "echo benchmark-info.txt" (not sure about the quoting right now).

You mean "cat benchmark-info.txt", right?

What would the semantics of that parameter be in your suggestion?

In general, if you prefer to retrieve the content from some kind of dynamic command instead of a static text file, I see these possibilities:

  1. Passing the contents of a shell script as parameter value. benchexec would start a shell and pass it the script to evaluate, then use the standard output of the script. Quite complex to implement, requires difficult quoting, and it would not even be clear which shell benchexec should start to execute the script.
  2. Passing a single command line as parameter value. benchexec would attempt to parse and handle the command line in a way that is similar to shells (split arguments on whitespace, etc.), then execute the command and use its standard output. This would mean we need to reimplement the command-line handling that shells do ourselves and it might be unintuitive for users if we do not implement everything that common shells provide (for example, if we only implement variable expansion but do not resolve other shortcuts like ~). There is still some quoting issue.
  3. Passing the path to an executable as parameter value. benchexec would just start this executable and use its output. This would be the least implementation effort and has no quoting issues. However, it would be the least flexible way for users (for example, they could not use a script that outputs the correct information depending on some parameter value, because the specified executable would be called without any parameters).

Your given example would be directly supported in possibilities 1. and 2. In case of 3. you would need have a script file that does the cat/echo.

I am not sure yet how table-generator should show the info. Perhaps a tool info for the run set.

What is a "tool info" in this context?

dbeyer commented 4 years ago

Answering from bottom upwards:

I meant "tool tip" ;-) But I have no strong opinion about this.

I need none of 1-3 above. Best for me is to provide a text file as parameter. What I wanted to say is to execute a command myself and provide the stdout as parameter. But I myself really do not need this. My scripts generate a file and I want to get the content into the XML.

I am perfectly fine with description as name of the tag. (I can still prefix my text with "Provenance" to tell more specifically what the info is about.)

I would say there are three steps of getting this done:

  1. extend the exchange format (see pull request)
  2. make BenchExec support this (it is easier for me to give the text file as parameter than to insert the XML tag myself
  3. make tables show this info somehow

I need (1) for the competition executions right now [important and urgent]. (2) is very convenient to have [urgent but not essential]. (3) is nice to have but not as crucial as (1) [has time].

PhilippWendler commented 4 years ago

I need none of 1-3 above. Best for me is to provide a text file as parameter.

Ok, I was just confused because your example looks like you want to give a command to BenchExec to execute. A parameter with a file name is of course much easier.

dbeyer commented 2 years ago

@PhilippWendler What about adding this description as tool tip to the generated tables? Users have requested such a feature in the community meeting of a competition. (Because it is inconvenient to open the XML file to see the information.)

dbeyer commented 2 years ago

BTW: The field version information in field description has been proven useful in several investigation cases already.

PhilippWendler commented 2 years ago

Tool-tip would be a possibility, yes. (Although maybe not ideal for larger amounts of text.)

But due to time constraints it is unlikely that this will be worked on soon.