Do type checking at compile time

bonjourmauko commented 6 years ago

Relates to openfisca/openfisca-france#926 Supersedes openfisca/openfisca-france#1033

Objective

Deduce ~10-20% the time taken when launching several tests at the same time.

Definition of done

[ ] Extract type checking from app's life-cycle (including tests)
[ ] Execute type checking on demand (i.e. make typecheck)
[ ] Execute type checking on CircleCI

Details

From openfisca/openfisca-france#1014 we know type checking represents an important time of the test suite's run time:

That's ~35% of total test suite's run time.

Useful links

Tips

There's no drop-in replacement of type-checking without refactoring, so it is advisable to use a divide and conquer stratégie :

Pick one function to experiment with
Extract one type-checking to another function (obvious candidates: isinstance)
Try to refactor that one to see how difficult it is to gradually typecheck
Share
Desist or repeat

fpagnoux commented 6 years ago

I do agree that avoiding type checks in production could save us a significant amount of time.

However, type checks are not done to control the quality of core, but to check users input. So I'm not sure that a make typecheck would really be suited here.

We have two big kind of type checks:

On the user input.
- We probably want to keep these in production, as it allows friendly error messages for users who send an invalid input. (Though for re-users like mes-aides who run their own server and control their input, skipping them could make sense).
On the tax and benefit system.
- These type checks provide friendly errors in case of mistakes while coding the legislation.
- However, they are not really useful in production, and we could bypass them (we talked about it recently).

bonjourmauko commented 6 years ago

After two pairing sessions with @sandcha, we tried to see where we could save some "dev" time by extracting custom type checking.

We avoided on the user input, as it would be a regression
We avoided on the tax and benefit system, as it would be a regression (in "dev" mode)

Then we decided to focus only on isinstance, which is ~12%, filtering by at least 5.000.000 calls:

$ make ccallers | awk '$1>5000000'
python -c "import pstats; p = pstats.Stats('tests.cprof'); p.strip_dirs().sort_stats('tottime').print_callers(9)"
   Ordered by: internal time
   List reduced from 5999 to 9 due to restriction <9>
Function                                            was called by...
                                                        ncalls  tottime  cumtime
simulations.py:133(calculate)                       <- 4679039/16249   18.951  480.208  entities.py:202(__call__)
taxscales.py:190(calc)                              <-       9    0.000    0.001  activite.py:247(formula)
{method 'round' of 'numpy.ndarray' objects}         <-     726    0.008    0.008  fromnumeric.py:37(_wrapit)
{method 'get' of 'dict' objects}                    <-       3    0.000    0.000  __init__.py:71(search_function)
                                                       8079928    2.744    2.744  data_storage.py:23(get)
                                                       7727103    1.202    1.202  entities.py:237(get_holder)
                                                      10184705    1.005    1.005  simulations.py:133(calculate)
                                                      31011729   10.153   10.153  taxbenefitsystems.py:250(get_variable)
shape_base.py:844(tile)                             <- 2205619   16.455   31.128  taxscales.py:190(calc)
base.py:24(iter_cotisations)                        <- 1471714   13.737  127.701  {sum}
entities.py:171(check_variable_defined_for_entity)  <- 6088381    6.944   14.214  entities.py:202(__call__)
                                                       7727103    5.890   11.279  entities.py:237(get_holder)
taxbenefitsystems.py:250(get_variable)              <- 13815484    5.793   11.274  entities.py:171(check_variable_defined_for_entity)
                                                       7597923    2.500    4.312  simulations.py:133(calculate)
                                                       7696896    2.954    5.282  simulations.py:419(get_variable_entity)
{isinstance}                                        <-       1    0.000    0.000  ConfigParser.py:285(read)
                                                       7790272    1.577    1.577  commons.py:8(to_unicode)
                                                       5277169    1.966    1.966  data_storage.py:23(get)
                                                      13815484    1.386    1.386  entities.py:171(check_variable_defined_for_entity)
                                                       6854990    2.009    2.009  parameters.py:432(__getitem__)
                                                      11170574    1.176    1.176  periods.py:778(period)
                                                       7597923    0.722    0.722  simulations.py:133(calculate)

That pointed us for example to:

    def get(self, period, extra_params = None):
        if self.is_eternal:
            period = periods.period(ETERNITY)
        period = periods.period(period)

        values = self._files.get(period)
        if values is None:
            return None
        if extra_params:
            if values.get(tuple(extra_params)) is None:
                return None
            return self._decode_file(values.get(tuple(extra_params)))
        if isinstance(values, dict):
            return self._decode_file(next(iter(values.values())))
        return self._decode_file(values)

For which, in order to extract type checking, we have before to render it purer (in the functional sense), and even though it is some good work to be done it is beyond the scope of this issue.

Closing until then, but feel free to re-open if you see something we didn't.

openfisca / openfisca-core

Do type checking at compile time #690