openfisca / openfisca-core

OpenFisca core engine. See other repositories for countries-specific code & data.
https://openfisca.org
GNU Affero General Public License v3.0
165 stars 74 forks source link

Refactor introspection data generation to get a much quicker startup time #1189

Closed guillett closed 12 months ago

guillett commented 1 year ago

Python version bump made OpenFisca Web API much much slower. It takes 5 minutes or so (cf discussion on Slack).

This slow startup is mainly due to the introspection data generation and more precisely to the python files parsing.

before

Screenshot 2023-07-17 at 17-26-04 full_7minutes_profile3

after

Screenshot 2023-07-17 at 17-27-49 fix_1946

similar to #1188

Breaking changes

Technical notes

I think it is worth sharing my development strategy:

I worked with a country package (France) and started gunicorn run --config config.py with

run.py


from openfisca_core.scripts import build_tax_benefit_system
from openfisca_web_api.app import create_app

country_package = 'openfisca_france'

tax_benefit_system = build_tax_benefit_system( country_package_name = country_package, extensions = [], reforms = [] )

application = create_app(tax_benefit_system)

and
> `config.py`
```python
import cProfile
import pstats
from io import StringIO
import logging
import os
import tempfile
import time

timeout=1200

profiles = {}
def post_fork(server, worker):
    worker.start_time = time.time()
    profile = cProfile.Profile()
    profile.enable()
    profiles[worker.pid] = profile
    worker.log.info("PROFILING %d" % (worker.pid))

def worker_abort(worker):
    tf = tempfile.NamedTemporaryFile(delete = False)
    worker.log.info("PROFILING RESULT %d: http://127.0.0.1:8081/snakeviz/%s" % (worker.pid, tf.name))
    profiles[worker.pid].dump_stats(tf.name)

def post_worker_init(worker):
    tf = tempfile.NamedTemporaryFile(delete = False)
    worker.log.info("PROFILING RESULT %d: http://127.0.0.1:8081/snakeviz/%s" % (worker.pid, tf.name))
    profiles[worker.pid].dump_stats(tf.name)

It allowed me to get visual clues with snakeviz on areas of potential improvements (as well as slow interesting portions I could comment out).

I also cloned https://github.com/python/cpython to have a better understanding of underlying functions. However, I didn't dig much in code in C I mainly stayed in Python and I recycled code snippets in our contexts.

coveralls commented 1 year ago

Coverage Status

coverage: 74.032% (+0.04%) from 73.99% when pulling 4f9990a181b9d905dc5e2fb17b76111c97f7d833 on fast-api into 94f68f4c876b0499de63af1a1e9bee6edda90983 on master.

guillett commented 1 year ago

Oh sorry, I forgot to give more context. I updated the initial comment.

bonjourmauko commented 9 months ago

Thank you for this changeset! I don't understand enough of the goals of the introspection nor of the suggested alternative code to approve or request changes. I like how less parameters are needed with that rewrite. Could you maybe share some data on that “much quicker startup time”? 🙂

@MattiSG Every variable has formulas, which are not defined in core but directly in each country package. For core to be able to use these formulas, the Variable class parses ~the AST, their internal sub-Python representation~ the source code verbatim from the files where these formulas are defined, and evaluates them, so they get defined in memory. (This is one of the reasons static analysis of OpenFisca at run-time is not possible: the formulas refered to internally have no tie with the actual code read from the packages source files).

bonjourmauko commented 9 months ago

@guillett What does it imply that no longer have parameters as well as calling functions? What I could get from the changeset is that comments are no longer parsed (why were there being parsed is yet as mistery for me).

If you could add more specific information about the actual changes it would be great :)

If I'm not mistaken, @benjello @eraviart this mechanism bypasses all compile-time and interpreter optimisations.

Another possible concern is that performance scales badly (time increases exponentially the bigger a country package).

Maybe @Morendil looked into this or @fpagnoux?