Closed benbalter closed 9 years ago
Hi Ben,
When I originally conceived the list, I honestly didn't care too much about performance. There are several possible optimization that could be introduced.
A simple one can be to parse the list into a balanced BST which can be queried in O(log n). Right now, if I recall correctly, the lookup is O(n).
This is the quickest idea that come to my mind. I should spend some time investigating the code and measuring performance. However, I can't promise I'll be able to dig into this issue in the near future.
Ahh awesome. I wasn't sure if it was my implementation or within PublicSuffix itself. Actually was able to speed things up quite a bit by passing the PublicSuffix::Domain
object around internally, rather than the domain string, and memoizing domain.to_s
, which brought things down from about 2:30 to parse 10,000 domains to about 20 seconds. Thanks for the awesome Gem. :smile_cat:
Wow, that's a great result!
On Friday, July 10, 2015, Ben Balter notifications@github.com wrote:
Ahh awesome. I wasn't sure if it was my implementation or within PublicSuffix itself. Actually was able to speed things up quite a bit by passing the PublicSuffix::Domain object around internally, rather than the domain string, and memoizing domain.to_s, which brought things down from about 2:30 to parse 10,000 domains to about 20 seconds. Thanks for the awesome Gem. [image: :smile_cat:]
— Reply to this email directly or view it on GitHub https://github.com/weppos/publicsuffix-ruby/issues/78#issuecomment-120457293 .
Simone Carletti Passionate programmer and dive instructor
http://simonecarletti.com/ Twitter: @weppos https://twitter.com/weppos
Ben, FYI I made quite significant performance improvements this week. You can check them out at #92. I'll be happy to resurrect this issue if you need some help.
I use PublicSuffix as part of Gman.
In addition to using PublicSuffix's native
valid?
check, Gman also contains it's own public-suffix formatted list of government domains which it then checks using PublicSuffix's rule logic. The relevant code is:Profiling against 10,000 random (valid) domains, which took about 250 seconds, here's the breakdown:
It appears the bottleneck is in PublicSuffix's matching. I see https://github.com/weppos/publicsuffix-ruby/pull/2, but do you have any suggestions how to speed up PublicSuffix's parsing, both for it's own native list, and for Gman's vendor list?