Many functions take two arguments - make `hep_tables` work with these as sequences

gordonwatts commented 4 years ago

func_adl has no trouble doing "x.pt() + x.eta()", however hep_tables will never let something like that through. This is because it has no way of recognizing that the two sequences are the same and can be used as common parameters.

This work will fix that.

First, create a routine that takes a tuples/lists of ast's from DataFrame and finds their common expressions. Then this should be tested on the following, at the very least:

[x] no arguments
[x] single argument of a sequence (df.jets.pt)
[x] two similar arguments (df.jets.pt, df.jets.eta)
[x] two arguments with different depths, but same ordinality (df.jets.pt, df.jets.eta*2.0)
[x] Two idential arguments (df.jets.pt, df.jets.pt)
[x] Two bad matches: (df.jets.pt, df.tracks.pt) (error!)
[x] A constant (df.jets.pt, 1)

Once that works, slot it into the code.

[x] First place to put it would be in render.py inside visit_Call, in _map_to_data. Look at the isinstance(a.func, ast.Name) which works on numpy functions - a two-argument numpy function would be a good one to test here.
[x] Once that works, it can be expanded to other places - like the + and other math and comparison operators.
[x] Make sure all check marks below are taken care of
[x] Get rid of all type errors

Finally, we need to remove the limitation in hl_tables:

[ ] hl_tables' hep_tables adaptor should not block when two items in the same operation are both sequences - let it through.

gordonwatts commented 4 years ago

I was really hoping to avoid having to work on this - but np.where this is needed... so - we have no choice.

gordonwatts commented 4 years ago

This code that renders an ast_Callable is already doing something along these lines:

    expr, new_context = render_callable(callable, context, callable.dataframe)

    # In that expr there may be captured variables, or references to things that
    # are not in `value`. If that is the case, that means we need to add a monad to fetch
    # them from earlier in the process.
    root_expr = _find_root_expr(expr, tracker.sequence._ast)
    if root_expr is tracker.sequence._ast:
        # Just continuing on with the sequence already in place.
        assert _is_list(tracker.sequence.result_type), \
            f'Expecting sequence, but got {ast.dump(tracker.sequence._ast)} ' \
            f'which is not of type list, but of type {tracker.sequence.result_type}.'
        # or isinstance(tracker.sequence, statement_unwrap_list)
        # if _is_list(tracker.sequence.result_type):
        #     s, t = _render_expression(
        #         statement_unwrap_list(tracker.sequence._ast, tracker.sequence.result_type),
        #         expr, new_context, tracker)
        # else:
        #     s, t = _render_expression(tracker.sequence, expr, new_context, tracker)
        seq = tracker.sequence.unwrap_if_possible()
        s, t = _render_expression(seq, expr, new_context, tracker)
        assert t.term == 'main_sequence'
        if _is_list(tracker.sequence.result_type):
            s = [smt.wrap() for smt in s]
        if len(s) > 0:
            tracker.statements += s
            tracker.sequence = s[-1]
        return t

    elif root_expr is not None:
        monad_index = tracker.carry_monad_forward(root_expr)
        monad_ref = _monad_manager.new_monad_ref()

        # Create a pointer to the base monad - which is an object
        with tracker.substitute_ast(
                root_expr, _ast_VarRef(term_info(f'{monad_ref}[{monad_index}]', object,
                                       [monad_ref]))):

            # The var we are going to loop over is a pointer to the sequence.
            seq_as_object = tracker.sequence.unwrap_if_possible()
            select_var = tracker.qvt.new_term(seq_as_object.result_type)
            select_var_rep_ast = _ast_VarRef(select_var)

            with tracker.substitute_ast(tracker.sequence._ast, select_var_rep_ast):
                trm = _resolve_expr_inline(seq_as_object, expr, new_context, tracker)

        result_type = _type_replace(tracker.sequence.result_type, select_var.type, trm.type)
        st = statement_select(a, tracker.sequence.result_type, result_type,
                              select_var, trm, tracker.qvt)
        if trm.has_monads():
            st.prev_statement_is_monad()
            st.set_monad_ref(monad_ref)

        tracker.statements.append(st)
        tracker.sequence = st
        return term_info('main_sequence', st.result_type)

    else:
        # If root_expr is none, then whatever it is is a constant. So just select it.
        _render_expresion_as_transform(tracker, context, expr)
        return term_info('main_sequence', tracker.sequence.result_type)

That call to _find_root_expr is the key here. Note there is a lot of work to pass along a monad - I wonder if we need to add that to the list of things that has to be supported?

[x] See if we can do an expression like that (that captures something else)
[x] Figure out how the monad needs to be passed in
[x] If it is just repeating this code, can we combine them into a common routine?

gordonwatts commented 4 years ago

This has devolved into a total rewrite:

[x] Get "evt.jets.pt" working all the way through
[x] Catalog everywhere we used "unwrap" in the previous code
[x] G et a n argument function working
[ ] Get a filter working
[x] Get a map working
[ ] Get a double map with lambda capture working (jets and tracks)
[ ] Get constants (5, 5.0+1.2) working
[ ] Get everything else working

gordonwatts commented 4 years ago

Places where we see an upwrap:

[x] Rendering an ast_Callable
[x] Function calls (like a named function like DeltaR or sin, etc)
[ ] Rendering a Filter
[ ] Binary operators (including bool operators)
[ ] Unary operators (!!)

So - we need to make sure all of these are handled by the new type system

gordonwatts / hep_tables

Many functions take two arguments - make `hep_tables` work with these as sequences #33