openfisca / openfisca-core

OpenFisca core engine. See other repositories for countries-specific code & data.
https://openfisca.org
GNU Affero General Public License v3.0
170 stars 75 forks source link

Do type checking at compile time #690

Closed bonjourmauko closed 6 years ago

bonjourmauko commented 6 years ago

Relates to openfisca/openfisca-france#926 Supersedes openfisca/openfisca-france#1033

Objective

Deduce ~10-20% the time taken when launching several tests at the same time.

Definition of done

Details

From openfisca/openfisca-france#1014 we know type checking represents an important time of the test suite's run time:

typecheck

That's ~35% of total test suite's run time.

Useful links

Tips

There's no drop-in replacement of type-checking without refactoring, so it is advisable to use a divide and conquer stratégie :

fpagnoux commented 6 years ago

I do agree that avoiding type checks in production could save us a significant amount of time.

However, type checks are not done to control the quality of core, but to check users input. So I'm not sure that a make typecheck would really be suited here.

We have two big kind of type checks:

bonjourmauko commented 6 years ago

After two pairing sessions with @sandcha, we tried to see where we could save some "dev" time by extracting custom type checking.

Then we decided to focus only on isinstance, which is ~12%, filtering by at least 5.000.000 calls:

$ make ccallers | awk '$1>5000000'
python -c "import pstats; p = pstats.Stats('tests.cprof'); p.strip_dirs().sort_stats('tottime').print_callers(9)"
   Ordered by: internal time
   List reduced from 5999 to 9 due to restriction <9>
Function                                            was called by...
                                                        ncalls  tottime  cumtime
simulations.py:133(calculate)                       <- 4679039/16249   18.951  480.208  entities.py:202(__call__)
taxscales.py:190(calc)                              <-       9    0.000    0.001  activite.py:247(formula)
{method 'round' of 'numpy.ndarray' objects}         <-     726    0.008    0.008  fromnumeric.py:37(_wrapit)
{method 'get' of 'dict' objects}                    <-       3    0.000    0.000  __init__.py:71(search_function)
                                                       8079928    2.744    2.744  data_storage.py:23(get)
                                                       7727103    1.202    1.202  entities.py:237(get_holder)
                                                      10184705    1.005    1.005  simulations.py:133(calculate)
                                                      31011729   10.153   10.153  taxbenefitsystems.py:250(get_variable)
shape_base.py:844(tile)                             <- 2205619   16.455   31.128  taxscales.py:190(calc)
base.py:24(iter_cotisations)                        <- 1471714   13.737  127.701  {sum}
entities.py:171(check_variable_defined_for_entity)  <- 6088381    6.944   14.214  entities.py:202(__call__)
                                                       7727103    5.890   11.279  entities.py:237(get_holder)
taxbenefitsystems.py:250(get_variable)              <- 13815484    5.793   11.274  entities.py:171(check_variable_defined_for_entity)
                                                       7597923    2.500    4.312  simulations.py:133(calculate)
                                                       7696896    2.954    5.282  simulations.py:419(get_variable_entity)
{isinstance}                                        <-       1    0.000    0.000  ConfigParser.py:285(read)
                                                       7790272    1.577    1.577  commons.py:8(to_unicode)
                                                       5277169    1.966    1.966  data_storage.py:23(get)
                                                      13815484    1.386    1.386  entities.py:171(check_variable_defined_for_entity)
                                                       6854990    2.009    2.009  parameters.py:432(__getitem__)
                                                      11170574    1.176    1.176  periods.py:778(period)
                                                       7597923    0.722    0.722  simulations.py:133(calculate)

That pointed us for example to:

    def get(self, period, extra_params = None):
        if self.is_eternal:
            period = periods.period(ETERNITY)
        period = periods.period(period)

        values = self._files.get(period)
        if values is None:
            return None
        if extra_params:
            if values.get(tuple(extra_params)) is None:
                return None
            return self._decode_file(values.get(tuple(extra_params)))
        if isinstance(values, dict):
            return self._decode_file(next(iter(values.values())))
        return self._decode_file(values)

For which, in order to extract type checking, we have before to render it purer (in the functional sense), and even though it is some good work to be done it is beyond the scope of this issue.

Closing until then, but feel free to re-open if you see something we didn't.