gordonwatts / hep_tables

Prototyping Hierarchical data, with servicex as a backend
MIT License
2 stars 0 forks source link

Many functions take two arguments - make `hep_tables` work with these as sequences #33

Open gordonwatts opened 4 years ago

gordonwatts commented 4 years ago

func_adl has no trouble doing "x.pt() + x.eta()", however hep_tables will never let something like that through. This is because it has no way of recognizing that the two sequences are the same and can be used as common parameters.

This work will fix that.

First, create a routine that takes a tuples/lists of ast's from DataFrame and finds their common expressions. Then this should be tested on the following, at the very least:

Once that works, slot it into the code.

Finally, we need to remove the limitation in hl_tables:

gordonwatts commented 4 years ago

I was really hoping to avoid having to work on this - but np.where this is needed... so - we have no choice.

gordonwatts commented 4 years ago

This code that renders an ast_Callable is already doing something along these lines:

    expr, new_context = render_callable(callable, context, callable.dataframe)

    # In that expr there may be captured variables, or references to things that
    # are not in `value`. If that is the case, that means we need to add a monad to fetch
    # them from earlier in the process.
    root_expr = _find_root_expr(expr, tracker.sequence._ast)
    if root_expr is tracker.sequence._ast:
        # Just continuing on with the sequence already in place.
        assert _is_list(tracker.sequence.result_type), \
            f'Expecting sequence, but got {ast.dump(tracker.sequence._ast)} ' \
            f'which is not of type list, but of type {tracker.sequence.result_type}.'
        # or isinstance(tracker.sequence, statement_unwrap_list)
        # if _is_list(tracker.sequence.result_type):
        #     s, t = _render_expression(
        #         statement_unwrap_list(tracker.sequence._ast, tracker.sequence.result_type),
        #         expr, new_context, tracker)
        # else:
        #     s, t = _render_expression(tracker.sequence, expr, new_context, tracker)
        seq = tracker.sequence.unwrap_if_possible()
        s, t = _render_expression(seq, expr, new_context, tracker)
        assert t.term == 'main_sequence'
        if _is_list(tracker.sequence.result_type):
            s = [smt.wrap() for smt in s]
        if len(s) > 0:
            tracker.statements += s
            tracker.sequence = s[-1]
        return t

    elif root_expr is not None:
        monad_index = tracker.carry_monad_forward(root_expr)
        monad_ref = _monad_manager.new_monad_ref()

        # Create a pointer to the base monad - which is an object
        with tracker.substitute_ast(
                root_expr, _ast_VarRef(term_info(f'{monad_ref}[{monad_index}]', object,
                                       [monad_ref]))):

            # The var we are going to loop over is a pointer to the sequence.
            seq_as_object = tracker.sequence.unwrap_if_possible()
            select_var = tracker.qvt.new_term(seq_as_object.result_type)
            select_var_rep_ast = _ast_VarRef(select_var)

            with tracker.substitute_ast(tracker.sequence._ast, select_var_rep_ast):
                trm = _resolve_expr_inline(seq_as_object, expr, new_context, tracker)

        result_type = _type_replace(tracker.sequence.result_type, select_var.type, trm.type)
        st = statement_select(a, tracker.sequence.result_type, result_type,
                              select_var, trm, tracker.qvt)
        if trm.has_monads():
            st.prev_statement_is_monad()
            st.set_monad_ref(monad_ref)

        tracker.statements.append(st)
        tracker.sequence = st
        return term_info('main_sequence', st.result_type)

    else:
        # If root_expr is none, then whatever it is is a constant. So just select it.
        _render_expresion_as_transform(tracker, context, expr)
        return term_info('main_sequence', tracker.sequence.result_type)

That call to _find_root_expr is the key here. Note there is a lot of work to pass along a monad - I wonder if we need to add that to the list of things that has to be supported?

gordonwatts commented 4 years ago

This has devolved into a total rewrite:

  1. [x] Get "evt.jets.pt" working all the way through
  2. [x] Catalog everywhere we used "unwrap" in the previous code
  3. [x] G et a n argument function working
  4. [ ] Get a filter working
  5. [x] Get a map working
  6. [ ] Get a double map with lambda capture working (jets and tracks)
  7. [ ] Get constants (5, 5.0+1.2) working
  8. [ ] Get everything else working
gordonwatts commented 4 years ago

Places where we see an upwrap:

So - we need to make sure all of these are handled by the new type system