vepadulano / PyRDF

Python Library for doing ROOT RDataFrame analysis
https://pyrdf.readthedocs.io/en/latest/
9 stars 7 forks source link

Restore original gDirectory after event loop #85

Closed vepadulano closed 5 years ago

vepadulano commented 5 years ago

In ROOT,gDirectory is a global variable that points to the currently open file in the program. Whenever a method such as Write() is called, e.g. when writing an histogram object in a file, the gDirectory will implicitly tell where that file is, without the need of specifying the file path when saving/writing.

Within a distributed event loop, the input files of the RDataFrame are opened to retrieve the information about their clusters. It so happens that whenever a new call to TFile or similar is issued, the gDirectory variable will be changed to point to the last file opened. This happens both in C++ and in Python.

The TDirectory::TContext class serves the purpose of storing the current gDirectory when instantiated, then restoring it when destroyed. This means that no matter how many times the gDirectory has been changed, it will be restored to its initial value if a TContext was set.

In the Dist class the gDirectory is changed in the scope of the get_clusters() method. After the reduce phase of the process, if a call to Write() or similar was issued, then it won't correctly save the ROOT object to the initial file because the gDirectory was reset in the event loop.

In C++ this could be avoided by explicitly creating a TContext at the beginning of the program and then calling its destructor at the end. In Python, a simple del TContext wouldn't guarantee a call to the C++ destructor. Instead, a context manager would enable a better management of the creation and the destruction of the TContext. Fortunately, PyROOT offers a pythonization that enables the call of the C++ destructor of any ROOT object through the __destruct__() dunder method.

To summarise: