Accelerate simulations using numba

IGF-PSD commented 1 month ago

We found that iterating certain simulations on exhaustive data using openfisca can be very slow. As openfisca relies heavily on numpy arrays, we thought that wrapping some of the package's classes with the numba package's jitclass decorator could greatly speed up calculations.

What do you think of this idea? Would you be interested in a merge request including these changes for a future release of the package?

MattiSG commented 1 month ago

Hi @IGF-PSD and thank you for this idea!

This is a very exciting potential contribution, and we would of course be very happy for performance gains 😃 We are also very thankful that you took the time to open an issue and start a discussion to best plan such an implementation 👏

In order to define the best course of action (RFC, direct implementation, level of collaboration, rollout plan…), could you share with us:

Concrete use cases where current performance is a bottleneck, in particular the size and type of “exhaustive data” you mention.
Existing benchmarks that you might have made using the current codebase vs partial implementations with numba you might already have.
Expected impact on the public Python API (and, if any, on the Web API, but as I understand it there should be none).
Expected impact on the private API.
Expected costs of maintenance over time (do the numba decorators need to be updated? how common is the knowledge required to do so?).
Any additional requirements for deployment that would be brought by this change, such as operating system constraints or availability of a GPU on the host machine.

I understand this is a lot of additional information. If you prefer, we could also set up a synchronous video call to discuss these points and best understand your context 🙂

I would also love to hear from community technical experts such as @benjello @benoit-cty @bonjourmauko @Morendil @nikhilwoodruff @eraviart if they have experience, opinions or insights on using numba or alternatives on the OpenFisca codebase 🙂

benoit-cty commented 1 month ago

Hello, this will be great ! At LexImpact, @clallemand achieve to compute the revenue taxes on exhaustive data (POTE dataset : 39M lines and more than 1K columns) in less than an hour, using caching capacities of OpenFisca to avoid recomputing all variables each time.

IGF-PSD commented 1 month ago

Hello,

Thank you very much for your feedback!

We noticed these delays while working on different scenarios for reforming employer social security contributions. Simulating a scenario for each month of the year could easily take an hour. Even using the caching capabilities of openfisca, these times seemed too slow to implement, for example, an under-constrained optimization program to find the optimum thresholds associated with each scenario within the framework of our micro-simulation model or use a sufficiently fine parameter grid.
Without being able to "connect" openfiscaand numbaas the Period, Entity, Parameters, Variableetc. types are not natively compatible with numba, our first tests consisted in comparing a formula vectorized using numpy (which de facto ignores any optimizations included in openfisca to speed up this type of operation, but uses the same approach to perform the calculations) and its equivalent using numba. To calculate, for example, general tax relief (which is a reduction in employer contributions contributions implemented in openfisca-france) on a simulated population of 20 million individuals, the vectorized formula takes 10.8 seconds to iterate, whereas its "numba" equivalent (sometimes using a functional form, or using @jitclass) iterates on the same population in the same population in 200 ms. Performance gains are therefore of the order of 50 on this example.

3-4. Integrating numba with openfisca would have a certain number of consequences for the package and its syntax:

It would require all Period, Entity, Parameters, Variable, set_input, calculate_outputto be made compatible with numba. We don't know openfisca in detail, but to date we haven't seen any prohibitive obstacles in the operation of these classes and methods that would prevent such compatibility. This is undoubtedly a laborious stage, but we can work on it on our side.
With the current version of numba, we'll need to rewrite the variables for all socio-fiscal systems derived from openfisca, as class attributes will only be tolerated by jitclass in a future version.
Finally, the current version of jitclass is not compatible with cuda and GPU parallelization is not yet integrated into numba. These two limitations will only be lifted in a future version of the package, so we don't know to what extent implementing these jitclass may prove inconsistent with your other optimization projects in the package. However, the use of variables and the iteration of simulations by an existing micro-simulation model using openfisca should not be altered by these modifications.

We've been using numba for several years and the syntax has never really changed over time. Apart from the class attributes we mentioned earlier, we don't see any potential future maintenance costs.
There are no operating system constraints associated with this modification.

However, we are very open to further discuss these potential changes !

MattiSG commented 1 month ago

Thank you very much for this detailed answer! A 50× compute performance gain sounds exciting 😃 The main point left for me to fully understand before considering an upgrade path is whether this can be a drop-in replacement or entails rewrites.

To calculate, for example, general tax relief

Just to clarify: I understand that you extracted the code for that formula from OpenFisca (which makes total sense to get rid of the underlying non-numba classes), meaning that there was no caching. I thus assume that the formula was not recursive, as this would significantly change the assessment (i.e. maybe compute costs are insignificant compared to cache miss costs). Please let me know if that is not the case 🙂

In order to better understand the cost of upgrading and the differences in syntax, would you be able to share the two formula implementations you used for this benchmark? This would be very useful for the community to assess the differences.

we'll need to rewrite the variables for all socio-fiscal systems derived from openfisca

If every dependent needs to rewrite every variable, this is a major drawback as the upgrade cost will be immense. To be sure: do you mean that rewriting the variables is an opt-in to enable numba optimisation on a per-variable basis, i.e. that variables as they are will still work but will not be optimised, or that once the change is brought every variable will have to be rewritten to be compatible with the numba-optimised base classes?

the current version of jitclass is not compatible with cuda and GPU parallelization […] we don't know to what extent implementing these jitclass may prove inconsistent with your other optimization projects in the package.

To my knowledge, we don't currently have any GPU optimisation in place in OpenFisca Core and have no resources secured for such an addition, so there would not be a competing implementation 🙂

bonjourmauko commented 3 weeks ago

Hi @IGF-PSD and @MattiSG! Indeed, I've been testing out improving OpenFisca with Cython for a while now. I did some tests with numba a while ago, and I know that @eraviart has been also trying to integrate cuda.

Just a concern, however: numba is a compiler, and we do not have the resources to ship compiled bins for every platform -unless that is what conda does already @benoit-cty @sandcha?- so, at best, what we can do, is to improve the actual implementation of openfisca "primitives" so they can profit from just-in-time optimisation.

As I've been pushing mudularisation and immutability for some time, some work has already been shipped, and some other is on the WIP queue. And making code optimisible is not that hard: it is a matter of extracting from the performance bottlenecks the objects that can't be compiled natively in C/C++.

So, it is above all, a design effort, pure compiler-aware refactoring.

bonjourmauko commented 3 weeks ago

BTW, I know traction for typing is not huge yet within the community, yet type-hinting is just about soft-enforcing good design that allows for compiler optimisations.

See, for example, a header file for Cython, that is, well, very similar to a Python type:

from retrievals import Page, PageParser

ctypedef object Page_t
ctypedef object PageParser_t

cdef class DocSplitter:
    cpdef list[Page_t] split(self, PageParser_t parser, TextSplitter text_splitter)

cdef class TextSplitter:
    cdef public int chunk_size
    cdef public int chunk_overlap
    cdef char* separator
    cdef object separator_pattern

    cpdef list[char*] split(self, char* text)
    cpdef list[char*] merge(self, list[char*] splits)
    cdef list[char*] get_current_chunk(self, list current_chunk)
    cdef int get_current_len(self, list[char*] current_chunk)

If we just enforce some of these constraints and export pure ADTs, users will be able to easily perform jit, aof, use cuda, or whatever atomic optimisation they want.

bonjourmauko commented 3 weeks ago

Another, probably harder up-front, but cheaper long-tail, option, that I think we discussed with @cbenz, is to generalise dependency injection of the classes that can be implemented with primitivy C types.

See, for example:

    def calculate_add(self, variable_name: str, period):
        variable: Optional[Variable]

        variable = self.tax_benefit_system.get_variable(
            variable_name, check_existence=True
        )

        if variable is None:
            raise errors.VariableNotFoundError(variable_name, self.tax_benefit_system)

        if period is not None and not isinstance(period, periods.Period):
            period = periods.period(period)

        # Check that the requested period matches definition_period
        if periods.unit_weight(variable.definition_period) > periods.unit_weight(
            period.unit
        ):
            raise ValueError(
                f"Unable to compute variable '{variable.name}' for period "
                f"{period}: '{variable.name}' can only be computed for "
                f"{variable.definition_period}-long periods. You can use the "
                f"DIVIDE option to get an estimate of {variable.name}."
            )

        if variable.definition_period not in (
            periods.DateUnit.isoformat + periods.DateUnit.isocalendar
        ):
            raise ValueError(
                f"Unable to ADD constant variable '{variable.name}' over "
                f"the period {period}: eternal variables can't be summed "
                "over time."
            )

        return sum(
            self.calculate(variable_name, sub_period)
            for sub_period in period.get_subperiods(variable.definition_period)
        )

There is no way to optimise that without extracting dependencies from the function:

   def guard(variable: Just[Variable], period: Just[Period]) -> TypeGuard[Match[Variable, Period]]:
        # Check that the requested period matches definition_period
        if Period.unit_weight(variable.definition_period) <= Period.unit_weight(period.unit):
            return True

        raise ValueError(
            f"Unable to compute variable '{variable.name}' for period "
            f"{period}: '{variable.name}' can only be computed for "
            f"{variable.definition_period}-long periods. You can use the "
            f"DIVIDE option to get an estimate of {variable.name}."
        )

    # etc...

     @numba.jit(forceobj=True)
     def sub_periods(variable: Just[Variable], period: Just[Period]) -> Just[Sequence[Tuple[int]]]:
         return set(sub_period.value for sub_period in period.get_subperiods(variable.definition_period))

    @numba.jit(nopython=True)
    def calculate_add(calc: [[str, tuple], float], tuple: Just[Match[float, set[tuple[int]]]]):
        return sum(map, calc, [v_value, p_values]))

Something like that.

openfisca / openfisca-core

Accelerate simulations using numba #1214