Eliminate default entry function

meheff commented 2 years ago

Currently the entry function may be explicitly specified or may be heuristically chosen by XLS (e.g. function "main" or a function which shares the name of the package). We should eliminate the heuristics and require that the entry is explicitly specified. This would result in less surprising behavior.

The context where this came up is in a review for a change to dead function elimination. We would have liked to eliminate dead procs if an entry function exists (the procs aren't used in this case). However, because the entry might have been chose heuristically we can't eliminate the procs because then you'd have a situation where the mere existence of a function named "main", say, would result in all procs being unconditionally eliminated from the package.

A related improvement would be to generalize the notion of an "entry" to include procs/blocks. In this case a better term might be "top".

Two possibilities here are: always require an entry/top, OR only require an entry/top if doing something which requires it (evaluation, codegen, etc).

vincent-mirian-google commented 2 years ago

Hi @meheff

The following are the tools that reference an entry/top: -ir_converter_main -benchmark_main -booleanify_main -codegen_main -delay_info_main -eval_ir_main -ir_minimizer_main -ir_stats_main -check_ir_equivalence -opt_main -smtlib_emitter_main

The tools would need to be updated to reflect that the entry/top argument is required. Moreover, a few build rules would need to be updated to have a mandatory entry/top attribute.

I agree rename it to 'top' is more suitable for some tools.

Did you want to make the changes to all the tools? For the build rules, the entry/top attribute would need to be mandatory. For the tools at the IR level, the user would need to use the get_mangled_ir_symbol function.

As we discussed, I can take this on. We can discuss offline. Thanks.

meheff commented 2 years ago

Brain dump on this:

There is a bit of complexity here because of procs. Specifically, a package might have multiple top level procs. For example, you could have proc A which takes input from the outside world, does something with it and then feeds proc B which does some more and then sends the result out to the outside world. For this we'd have two top-level procs: A and B. So I think there are several possibilities for "top"

Single Function* specified by flag --top.
Single Block*which can be specified by either by the flag --top or during the codegen process. For example, when the top function is converted to a block the top is switched from the function to the block.
A single Proc* specified by the flag --top.
One or more Proc*. In this case no --top is specified, but the top procs are those which have external channels which communicate with the outside world (send-only or receive-only channels).

One alternative for procs would be to always require the top procs to be specified by flag. This might be a bit cumbersome as it involves specifying multiple proc names and the external channels naturally identify the intended top level procs.

Given the above, it may make sense for Package::EntryFunction (to be renamed to something like Package::GetTop) to maybe return a value of type:

absl::variant<Block*, Function*, std::vector<Proc*>>

A bit gross, but in this case it may make sense to be very constrained about what is returned as opposed to returning std::vector<FunctionBase*> and verifying the results at each caller.

In regards to the interaction of dead function elimination (DFE) with top. For each case above:

(single Function*) Delete all procs and blocks as well as any function which is not the top function or called by the top function including transitively.
(single Block*) Delete all procs and functions (we assume blocks can't call functions). Also remove all blocks which are not instantiated by the top block.
(single Proc* specified by flag) This is complicated because isolating the specified proc from the larger proc network is nontrivial. Probably need to do some kind of domination analysis on the proc network to determine the connected set of procs which remain. In any case, once such a set is identified delete all other procs and replace any half-connected channels with external channels (send-only and receive-only). Then delete all functions not called by the remaining procs. Remove all blocks (probably wouldn't be any blocks anyway).
(one or more Proc* identified by external channels) delete any procs not connected to these procs. Delete all functions not called by the remaining procs. Remove all blocks.

meheff commented 2 years ago

@vincent-mirian-google To your earlier question about tackling this. I think a starting point may be to first change GetEntry to be closer to what we want and then fix the fallout.

Specifically, change GetEntry to return a FunctionBase* and eliminate the existing heuristic for selecting the entry function (which does things like look for "main", etc):

https://github.com/google/xls/blob/2e400259e423e953404b9abb49090e7d4f9c3060/xls/ir/package.cc#L157

Maybe (for now) just have it do the following:

If entry_ is set, return the respectively named FunctionBase.
If there is a single proc, return that as the entry.
Return an error.

vincent-mirian-google commented 2 years ago

This task is complete.

google / xls

Eliminate default entry function #531