jaraco / inflect

Correctly generate plurals, ordinals, indefinite articles; convert numbers to words
https://pypi.org/project/inflect
MIT License
954 stars 104 forks source link

import time is unacceptably slow #212

Open alonme opened 2 months ago

alonme commented 2 months ago

When switching from version 7.0.0 to 7.2.0, the import time grew from around 42ms to around 1.3 Seconds (31X slower)

By profiling the code running on import it seems that the issue stems from the switch from pydantic to typeguard

reproduction

pip install inflect==7.2.0
time python -c "import inflect" # results in around 1.3 seconds

pip install inflect==7.0.0
time python -c "import inflect" # results in around 0.13 seconds

python -X importtime -c "import inflect" was also useful to get the exact timing found above

related PR - https://github.com/jaraco/inflect/pull/199

ni-todo-spot commented 2 months ago

+1 it slows down my cicd builds

jaraco commented 2 months ago

I appreciate the report.

I see similar slowness on my machine:

 @ pip-run inflect -- -X importtime -c 'import inflect' 2>1 | tail -n 1
import time:    875507 |     924648 | inflect

924 ms using Python 3.13 with freethreading on an M3 Pro chip.

That's interesting. I also tested how much time import typeguard causes and it's very small:

 @ pip-run typeguard -- -X importtime -c 'import typeguard' 2>1 | tail -n 1
import time:      1085 |      47824 | typeguard

47 ms to import typeguard itself. So it's the actual application to inflect that's adding most of the delay.

I'm not sure what to do about it.

The reason we switched from the slightly more elegant pydantic to typeguard was because pydantic has a rust build-time dependency and so limits the viable platforms on any project that requires it (for example, cygwin has no rust compiler) and constrains the ways the library can be used (vendoring, on which Setuptools currently relies, becomes non-viable).

I do have plans to try to implement parameter transformation and validation in jaraco/transformers, but I haven't had time to invest in it.

I'm reluctant to simply implement the behavior inline for the reasons outlined in https://github.com/jaraco/inflect/issues/195#issuecomment-1654861328 (I'd like to find a re-usable mechanism that this and every other project can use).

It may be the case that practicality beats purity and we'll need to accept the hit that typeguard is non-viable for any project that's import-time sensitive.

Would someone be willing to raise this issue with typeguard and see if that project has any hope of improving the import time? Such an issue should probably distill the issue to characterize which usage is causing the slowness.

Thanks for any help you can provide.

alonme commented 2 months ago

import_inflect Adding a flamegraph generated by py-spy. (its interactive if you download it)

A solution from inflect's side can be something like this

def lazy_typechecked(func: Callable) -> Callable:
    def wrapper(*args, **kwargs) -> Any:
        typechecked_func = typechecked(func)
        return typechecked_func(*args, **kwargs)

    return wrapper

This way - we delay the actual application of typechecked to when the function is actually called, This will make the usages slower however i think that adding a cache should be fine?

@functools.cache
def cached_typechecked(func: Callable) -> Callable:
    return typechecked(func)

def lazy_typechecked(func: Callable) -> Callable:
    def wrapper(*args, **kwargs) -> Any:
        typechecked_func = cached_typechecked(func)
        return typechecked_func(*args, **kwargs)

    return wrapper
alonme commented 2 months ago

@jaraco what do you think about the solution? would you accept that PR?

jaraco commented 2 months ago

Thanks for your ping and the patience.

A solution from inflect's side can be something like this

I'm reluctant to accept a workaround in inflect for several reasons.

First, it means that inflect's use of typeguard isn't demonstrable of something usable elsewhere. That is, this pattern isn't a re-usable pattern. My goal in adopting first pydantic and now typeguard is to elevate the ecosystem - to identify and utilize language constructs that provide more sophisticated behavior with less boilerplate and instructions. I have ambitions to do much more than just validating inputs.

Second, if typeguard is prohibitively expensive at import time, addressing it at the inspect module isn't the right place. If the best solution is to defer the behavior until called, that should probably happen in typeguard. I could see accepting this sort of change as a temporary workaround for a known, accepted limitation, as long as there was a long term strategy for eventually replacing that workaround with a more durable solution.

Third, this workaround is addressing an even more general problem, that of import-time costs for consumers that never invoke the behavior. That is, these costs need to be borne at some point if inflect is to be used. The savings are only really useful if the dependent library/application never makes use of the functionality. Unfortunately, the attempts to solve such problems systematically have failed. Perhaps the dependencies of inflect should consider something like slothy to defer the imports of inflect until they're needed.

Overall, I feel like the proposed patch is the worst compromise and that the issue should be dealt with elsewhere in the ecosystem.

Can you say a bit more about what your use-case is and how the import time impacts your project?

alonme commented 2 months ago

My issue in typeguard got no response for 2 weeks now, and i am not sure what is the maintenance level of typeguard ( https://github.com/agronholm/typeguard/issues/198).

I agree that this isn't a very nice solution, however i have no idea if typeguard would accept this kind of solution or if there is another solution to make their code faster.

I encountered this slowness while profiling the tests of one of my projects in which each tests needs to run in a fresh python environment (and hence imports are ran in every test)

regarding this:

Perhaps the dependencies of inflect should consider something like slothy to defer the imports of inflect until they're needed.

I agree that ideally this shouldn't be solved at inflects level, however i am currently not optimistic about this being solved in typeguard soon which either leaves inflect or the end users responsible for handling this. Unless inflect will highlight this issue in the readme, most users wouldn't know about this issue / solution, so i don't think this is the way to go.

bswck commented 1 month ago

I suggested myself as a maintainer of typeguard and reached out to Alex directly. Once I'm approved, I'll manage to solve the concerns presented in this issue.