Closed atrettin closed 3 years ago
Now that all unit tests pass, this could actually be pulled! π We can still change the setup.py
whenever the actual release of fast_histogram
is made, but for now this seems to be ok as it is.
Great! Does the fast histo package now compile automatically during installation?
Also, just out of interest, how much of a speed-up do you get with this for you analysis?
Great! Does the fast histo package now compile automatically during installation?
Yes! Everything should just happen magically! πβ¨π¦ It works in the GitHub workflow, so I hope it will work anywhere.
Also, just out of interest, how much of a speed-up do you get with this for you analysis?
Sadly, the speedup in the real world is more modest than what artificial tests with normally distributed data suggested. When I benchmark the Container
class on its own, I get a speedup of 10, but the real speedup in the apply_function
of the hist
stage is a factor of 3.5 for an absolute time saving of 500 ms, the speedup of the apply_function
in the nusquids stage is 5 for a saving of 800 ms.
An "idle" output from my PISA pipeline (i.e. without changing any parameter, just going through all the apply
functions) is sped up by a factor of 2 from 900 ms to 450 ms (the timings varied unexpectedly wildly, some of these numbers may seem inconsistent). An output after changing one parameter in nusquids now takes 4.7 s where it used to take 6 s. A modest speedup, but not a game-changing one.
Does indeed compile!
However, I got an error when executing a pipeline:
<timed exec> in <module>
/mnt/c/Users/peller/work/pisa/pisa/core/pipeline.py in get_outputs(self, output_binning, output_key)
304
305
--> 306 self.run()
307
308 if output_binning is None:
/mnt/c/Users/peller/work/pisa/pisa/core/pipeline.py in run(self)
329 for stage in self.stages:
330 logging.debug(f"Working on stage {stage.stage_name}.{stage.service_name}")
--> 331 stage.run()
332
333 def setup(self):
/mnt/c/Users/peller/work/pisa/pisa/core/stage.py in run(self)
368 def run(self):
369 self.compute()
--> 370 self.apply()
371 return None
372
/mnt/c/Users/peller/work/pisa/pisa/core/stage.py in apply(self)
355 if self.profile:
356 start_t = time()
--> 357 self.apply_function()
358 end_t = time()
359 self.apply_times.append(end_t - start_t)
/mnt/c/Users/peller/work/pisa/pisa/utils/profiler.py in profiled_func(*args, **kwargs)
123 try:
124 start_t = time()
--> 125 return func(*args, **kwargs)
126 finally:
127 end_t = time()
/mnt/c/Users/peller/work/pisa/pisa/stages/utils/hist.py in apply_function(self)
127 container.representation = self.calc_mode
128 sample = []
--> 129 dims_log = [d.is_log for d in self.apply_mode]
130 for dim, is_log in zip(self.regularized_apply_mode, dims_log):
131 if is_log:
/mnt/c/Users/peller/work/pisa/pisa/stages/utils/hist.py in <listcomp>(.0)
127 container.representation = self.calc_mode
128 sample = []
--> 129 dims_log = [d.is_log for d in self.apply_mode]
130 for dim, is_log in zip(self.regularized_apply_mode, dims_log):
131 if is_log:
AttributeError: 'str' object has no attribute 'is_log'```
Huh, how is that possible? The apply_mode
in the hist
stage must be a MultiDimBinning
, as is asserted in L38
My apologies! That was my bad, in the notebook I tested some apply modes were changed at runtime, and wrongfully the one of the hist stage π
This PR implements fast histogramming with the
fast_histogram
package and fast lookup with dedicated numba functions for regular binnings.When binnings are fully regular and linear, the
fast_histogram
library is automatically used when Container contents are translated from arrays to histograms, and the fast lookup functions are used when Container contents are translated from binnings to arrays.Because the logarithm is a very costly operation (especially in double precision), regularizing binnings into log-space would be very slow and nearly negate the advantage from using fast histograms. To circumvent this problem, a mechanism is implemented in the Container class to cache the logarithm of array representations. There are now two array representations: "events" and "log_events". When the "log_events" representation is requested, the log is calculated event-wise. As with any other representation, the result is cached such that the log need not be calculated again unless the contents are changed via item assignment.
With these two changes, histogram and lookup operations for regular (linear or logarithmic) binnings are accelerated by a factor between 5-10, at least in artificial benchmarks with normally distributed data, without any need to change Pipeline configurations or stages.
Finally, the
hist
stage automatically detects irregular axes in the output binning and pre-computes indices for these axes (typically the PID axis) such that a regular binning in the index can be used. This is all done automatically, such that every analysis, even one using an irregular output binning, can take advantage of fast histogramming.Note
Currently, not all features needed have been merged into the
fast_histogram
package for a release. For now, an installation from the source of my fork's "xmax_fix" branch is required. I'm pushing for an official release, but until then, I will keep this a draft.