lebrice / SimpleParsing

Simple, Elegant, Typed Argument Parsing with argparse
MIT License
384 stars 46 forks source link

Make `simple-parsing` faster #278

Closed anivegesana closed 10 months ago

anivegesana commented 11 months ago

Is your feature request related to a problem? Please describe. Thank you for your work on #276! In the parser that our team wrote, we noticed that most of the time needed to generate the --help menu was spent inside of the inspect.getsource function. I think the issue is that the issue is that caching is done on _get_attribute_docstring, but the majority of the time is spent in inspect.getsource, which has to be done multiple times for the same dataclass since the key of the cache is (dataclass, field_name).

Describe the solution you'd like Constructing caches for the inspect.getsource, inspect.getdoc, and dp.parse functions dramatically speeds up the construction of the parser. I ran a simple experiment where I didn't change any code of the simple-parsing library and simply prepended the following lines of code to my script:

import functools
import inspect

import docstring_parser as dp

inspect.getsource = functools.lru_cache(2048)(inspect.getsource)
inspect.getdoc = functools.lru_cache(2048)(inspect.getdoc)
dp.parse = functools.lru_cache(2048)(dp.parse)
Construction without monkey patch: 0.902 s
Parsing without monkey patch: 0.323 s
Construction with monkey patch: 0.020 s
Parsing with monkey patch: 0.021 s

Also, you import numpy even if you don't use it. It is possible to import numpy lazily so that --help menus will be more snappy. (I haven't checked to see if this is a major source of unnecessary waiting, but, on my machine, it seems to take up ~0.105s/0.211s to construct the parser and parse the arguments.This may not may not also apply to yaml.)

To lazily load numpy, you can create a module called lazy_numpy.py and use the following code:

def __getattr__(attr):
    import numpy as np
    globals().update(vars(np))
    return getattr(np, attr)

def __dir__():
    import numpy as np
    globals().update(vars(np))
    return dir(np)

Importing lazy_numpy does not import numpy but accessing any attribute within lazy_numpy does.

Describe alternatives you've considered I haven't tried other locations that caching could be improved. I just think that, if three lines of code can speed things up this dramatically, might as well add them.

Additional context I can open a PR, but I do not have a good idea on how to implement test cases for this.

lebrice commented 11 months ago

Thanks a lot @anivegesana , You're more than welcome to submit a PR for the caching, I think the current tests are sufficient in convering the docstring creation, I dont think additional tests are needed.

lebrice commented 11 months ago

Hey @anivegesana , just FYI, I did make the numpy import lazy, and added the lru_cache to all the inspect functions in #279

anivegesana commented 11 months ago

Thank you! I noticed that my --help menu was much faster with yesterday's release. I think that there is a place that you forgot to remove an import for numpy: https://github.com/anivegesana/SimpleParsing/commit/153eb03334f0e121dd2c1f55642458b5e45417c1

Thank you for your awesome work!

lebrice commented 11 months ago

Oops. Thanks for pointing it out