Closed elliotwutingfeng closed 1 year ago
suffix_index() now uses a trie, which eliminates some string concatenation overhead in the old implementation for about 10-15% reduction in execution time.
Helps #175.
Python 3.10, Linux x64, Ryzen 7 5800X
import tldextract %timeit tldextract.extract("") %timeit tldextract.extract("com") %timeit tldextract.extract("example\u3002com") %timeit tldextract.extract("subdomain\uff0eexample\uff61com") %timeit tldextract.extract("a\u3002very\uff0elong\uff61subdomain\u3002example\uff0ecom") %timeit tldextract.extract("an\uff61even\u3002longer\uff0eand\uff61complex\u3002subdomain\uff0eexample\uff61com") %timeit tldextract.extract("https://a\u3002b\uff0ec\uff61d\u3002e\uff0ef\uff61g\u3002h\uff0ei\uff61j\u3002k\uff0el\uff61m\u3002n\uff0eoo\uff61pp\u3002qqq\uff0errrr\uff61ssssss\u3002tttttttt\uff0euuuuuuuuuuu\uff61vvvvvvvvvvvvvvv\u3002wwwwwwwwwwwwwwwwwwwwww\uff0exxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx\uff61yyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyy\u3002zzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzz.tw") %timeit tldextract.extract("\u3002\u3002\u3002\u3002\u3002\u3002\u3002\u3002\u3002\u3002\u3002\u3002\u3002\u3002\u3002\u3002\u3002\u3002\u3002\u3002\uff0e\uff0e\uff0e\uff0e\uff0e\uff0e\uff0e\uff0e\uff0e\uff0e\uff0e\uff0e\uff0e\uff0e\uff0e\uff0e\uff0e\uff0e\uff0e\uff0e\uff0e\uff0e\uff0e\uff0e\uff0e\uff0e\uff0e\uff0e\uff0e\uff0e\uff0e\uff0e\uff0e\uff61\uff61\uff61\uff61\uff61\uff61\uff61\uff61\uff61\uff61\uff61\uff61\uff61\uff61\uff61\uff61\uff61\uff61\uff61\uff61\uff61\uff61\uff61\uff61\uff61\uff61\uff61\uff61\uff61\uff61\uff61\uff61\uff61\uff61\uff61\uff61\uff61\uff61\uff61\uff61\uff61\uff61\uff61\uff61\uff61\uff61\uff61\uff61\uff61\uff61\uff61\uff61\uff61\uff61\uff61\uff61\uff61\uff61\uff61\uff61\uff61\uff61\uff61\uff61\uff61\uff61\uff61\uff61\uff61\uff61\uff61\uff61\uff61")
1.44 µs ± 2.17 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each) 1.63 µs ± 12 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each) 1.95 µs ± 25.3 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each) 2.14 µs ± 75.9 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each) 2.31 µs ± 15.7 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each) 2.38 µs ± 10.7 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each) 3.38 µs ± 22.7 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each) 2.28 µs ± 19.6 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
Beautiful. 🙏
suffix_index() now uses a trie, which eliminates some string concatenation overhead in the old implementation for about 10-15% reduction in execution time.
Helps #175.
Python 3.10, Linux x64, Ryzen 7 5800X
Changes