Implement fast histogramming and lookup

atrettin commented 3 years ago

This PR implements fast histogramming with the fast_histogram package and fast lookup with dedicated numba functions for regular binnings.

When binnings are fully regular and linear, the fast_histogram library is automatically used when Container contents are translated from arrays to histograms, and the fast lookup functions are used when Container contents are translated from binnings to arrays.

Because the logarithm is a very costly operation (especially in double precision), regularizing binnings into log-space would be very slow and nearly negate the advantage from using fast histograms. To circumvent this problem, a mechanism is implemented in the Container class to cache the logarithm of array representations. There are now two array representations: "events" and "log_events". When the "log_events" representation is requested, the log is calculated event-wise. As with any other representation, the result is cached such that the log need not be calculated again unless the contents are changed via item assignment.

With these two changes, histogram and lookup operations for regular (linear or logarithmic) binnings are accelerated by a factor between 5-10, at least in artificial benchmarks with normally distributed data, without any need to change Pipeline configurations or stages.

Finally, the hist stage automatically detects irregular axes in the output binning and pre-computes indices for these axes (typically the PID axis) such that a regular binning in the index can be used. This is all done automatically, such that every analysis, even one using an irregular output binning, can take advantage of fast histogramming.

Note

Currently, not all features needed have been merged into the fast_histogram package for a release. For now, an installation from the source of my fork's "xmax_fix" branch is required. I'm pushing for an official release, but until then, I will keep this a draft.

atrettin commented 3 years ago

Now that all unit tests pass, this could actually be pulled! 🎉 We can still change the setup.py whenever the actual release of fast_histogram is made, but for now this seems to be ok as it is.

philippeller commented 3 years ago

Great! Does the fast histo package now compile automatically during installation?

Also, just out of interest, how much of a speed-up do you get with this for you analysis?

atrettin commented 3 years ago

Great! Does the fast histo package now compile automatically during installation?

Yes! Everything should just happen magically! 🌈✨🦄 It works in the GitHub workflow, so I hope it will work anywhere.

Also, just out of interest, how much of a speed-up do you get with this for you analysis?

Sadly, the speedup in the real world is more modest than what artificial tests with normally distributed data suggested. When I benchmark the Container class on its own, I get a speedup of 10, but the real speedup in the apply_function of the hist stage is a factor of 3.5 for an absolute time saving of 500 ms, the speedup of the apply_function in the nusquids stage is 5 for a saving of 800 ms.

An "idle" output from my PISA pipeline (i.e. without changing any parameter, just going through all the apply functions) is sped up by a factor of 2 from 900 ms to 450 ms (the timings varied unexpectedly wildly, some of these numbers may seem inconsistent). An output after changing one parameter in nusquids now takes 4.7 s where it used to take 6 s. A modest speedup, but not a game-changing one.

philippeller commented 3 years ago

Does indeed compile!

However, I got an error when executing a pipeline:


<timed exec> in <module>

/mnt/c/Users/peller/work/pisa/pisa/core/pipeline.py in get_outputs(self, output_binning, output_key)
    304 
    305 
--> 306         self.run()
    307 
    308         if output_binning is None:

/mnt/c/Users/peller/work/pisa/pisa/core/pipeline.py in run(self)
    329         for stage in self.stages:
    330             logging.debug(f"Working on stage {stage.stage_name}.{stage.service_name}")
--> 331             stage.run()
    332 
    333     def setup(self):

/mnt/c/Users/peller/work/pisa/pisa/core/stage.py in run(self)
    368     def run(self):
    369         self.compute()
--> 370         self.apply()
    371         return None
    372 

/mnt/c/Users/peller/work/pisa/pisa/core/stage.py in apply(self)
    355         if self.profile:
    356             start_t = time()
--> 357             self.apply_function()
    358             end_t = time()
    359             self.apply_times.append(end_t - start_t)

/mnt/c/Users/peller/work/pisa/pisa/utils/profiler.py in profiled_func(*args, **kwargs)
    123         try:
    124             start_t = time()
--> 125             return func(*args, **kwargs)
    126         finally:
    127             end_t = time()

/mnt/c/Users/peller/work/pisa/pisa/stages/utils/hist.py in apply_function(self)
    127                 container.representation = self.calc_mode
    128                 sample = []
--> 129                 dims_log = [d.is_log for d in self.apply_mode]
    130                 for dim, is_log in zip(self.regularized_apply_mode, dims_log):
    131                     if is_log:

/mnt/c/Users/peller/work/pisa/pisa/stages/utils/hist.py in <listcomp>(.0)
    127                 container.representation = self.calc_mode
    128                 sample = []
--> 129                 dims_log = [d.is_log for d in self.apply_mode]
    130                 for dim, is_log in zip(self.regularized_apply_mode, dims_log):
    131                     if is_log:

AttributeError: 'str' object has no attribute 'is_log'```

atrettin commented 3 years ago

Huh, how is that possible? The apply_mode in the hist stage must be a MultiDimBinning, as is asserted in L38

philippeller commented 3 years ago

My apologies! That was my bad, in the notebook I tested some apply modes were changed at runtime, and wrongfully the one of the hist stage 🙈

icecube / pisa

Implement fast histogramming and lookup #673

Note