weppos / publicsuffix-ruby

Domain name parser for Ruby based on the Public Suffix List.
https://simonecarletti.com/code/publicsuffix
MIT License
617 stars 109 forks source link

Switch List implementation to use Trie-based lookup #134

Open weppos opened 7 years ago

pzb commented 7 years ago

As a historical note, I was hacking on something similar a while ago, but never got around to integrating it with the gem. https://gist.github.com/pzb/5aba13a67bd9fa64b3769397c842889b is what I had. It is way faster than the existing gem but is missing support for dynamically enabling/disabling the private section.

weppos commented 7 years ago

Thanks for the feedback @pzb

This PR, along with #133, was the result of a research I made as part of my degree thesis. I must say that the results achieved with #133 are already stunning compared with the existing gem, and I am planning on releasing it as soon as I can.

Sadly, I merged it a while ago but there is a lot of extra work (mostly docs and deprecation info) I have to complete before releasing it as a major version.

You can already test it using master instead of the released gem. The library is now working in constant time, whereas before it was still linear time (although optimized).

The tree based version in this PR is a few milliseconds slower than the hash-based one, but it allows to save some extra bytes of allocation. That's why I was considering to merge it as well.

The good news is that both this PR and #133 allows dynamic modification of the list. I worked on a DAWG/DAFSA version that was even more lightweight, but that did not allow dynamic modifications of the list hence I discarded it for now.

If you have the chance, take a look at #134 and give a try at the version in master that already includes that PR. I believe you'll be very happy about the improvements. :)