gordonwatts / BDTTrainingAnalysisLanguage

Pull from ATLAS EXOT 15 Derivation, columnar data, and flat rootutples with RDF to scikitlearn in one nice fast swoop
0 stars 2 forks source link

BDTTrainingAnalysisLanguage

This repro is an experiment in IRIS-HEP analysis languages.

The code here will start from an ATLAS derivation (EXOT_15) and feed it into a BDT training that can differentiate between MJ background and long-lived particle signal in the jets. This is part of the CalRatio effort.

Goals:

Non-Goals:

Above all, this is an experiment. It is based off many things learned in the LINTQToROOT project.

Usage

Platform requirements:

Tested on:

Implementation

This will have to have C++ parts and Python (numpy) parts.

Backend Implentation

A new background is required to run on a different file. Or below the sheets, power it from a differet file format.

Implementing a new executor isn't terribly hard. Here is an untested outline of what must be done.

  1. The system finds the proper backend code through the AST note that contans tje get_execcutor method. See the file xAODLib/AtlasEventStream.py for this file.

  2. This returns the executor found in the bottom of the atlas_xaod_executor.py. This drives everything. Almost everything is done in the evaluate method (there are a bunch of helpers). There are two stages.

    1. The qv.visit(ast) is the main line. This starts the traversal of the AST that we need to turn into code. The visitor is based on python's ast.NodeVisitor class. As it goes, it tracks what it is looking at. This is tricky. For example, how the SelectMany node translates from a collection to a sequence of its items.

    2. The output file is generated using the template engine jinj2 - though anything can be used.

  3. Finally, the code is run in a docker container which maps a temp script directory and the directory containing the data file.

  4. The results of this have to be given back to the calling source code. THis is currently a fake burried in the executor.