python-validators / validators

Python Data Validation for Humans™.
MIT License
958 stars 152 forks source link

fix: reduce memory footprint when loading TLDs #362

Closed yozachar closed 5 months ago

salty-horse commented 2 months ago

This significantly slows down TLD lookup. Opening a file and scanning it for every email validation is very inefficient. Is the memory footprint that much of a concern?

Would you be open to changing it back to something like this, which is 10 times faster?

_iana_tld_set = None

def _iana_tld():
    global _iana_tld_set
    if _iana_tld_set:
        return _iana_tld_set

    with Path(__file__).parent.joinpath("_tld.txt").open() as tld_f:
        _ = next(tld_f)
        _iana_tld_set = {line.strip() for line in tld_f}
    return _iana_tld_set
yozachar commented 2 months ago

Opening a file and scanning it for every email validation is very inefficient.

That's true for repeated validations.

Is the memory footprint that much of a concern?

Yes, if the file is too, large and/or, system memory is insufficient.


What about a load_iana_tld() method?

It will load and store the TLDs once. If that method isn't called, it'll lookup the file every time. Associate that method with a dataclass, instead of using global variables.

A PR is welcome.