rushmorem / publicsuffix

An implementation of Mozilla's Public Suffix List in Rust
MIT License
96 stars 17 forks source link

Please support accessing an offline version of the list #2

Closed joshtriplett closed 7 years ago

joshtriplett commented 7 years ago

Some other implementations of the public suffix list use (and share) an offline version of the list. For instance, libpsl, Python's publicsuffix, Perl's Domain::PublicSuffix, and Haskell's publicsuffixlist all use the same shared copy, packaged in Debian as publicsuffix.

Please consider supporting the use of a shared copy on the system. You might also support falling back to a compiled-in version (perhaps provided in a separate crate to avoid having to update this one too frequently). And then, for anyone who wants to download the list themselves and keep it updated, you can provide a URL as a constant, and let callers download that using whatever HTTP library they already use and provide its contents. (That would also avoid having to deal with caching policies within this library.)

This would also address issue #1.

rushmorem commented 7 years ago

This library has always supported accessing an offline version of the list as well as pulling the list from arbitrary URLs. See List::from_path and List::from_url methods. In fact, List::fetch simply calls from_url to download the list from the official URL, falling back to their Github repo if the site is down for some reason.

What we don't do is ship with any offline version. This is a deliberate decision. The public suffix list is a living document which receives updates frequently. Some people actually use it to check if a domain is valid. If they base their information from an ancient version of the list they are bound to get incorrect results. Users should be able to make informed decisions about caching strategies for their particular use case.

Please consider supporting the use of a shared copy on the system.

By using the aforementioned from_path method, one can already use a shared copy on their system by simply calling that method with the path to the list on their system.

Please note that https://github.com/rushmorem/publicsuffix/issues/1 has since been fixed. Thank you for your interest in this library.

joshtriplett commented 7 years ago

Would you consider adding a feature flag that crates depending on this could disable, then, to disable the corresponding dependencies and the support for from_url and fetch?

rushmorem commented 7 years ago

@joshtriplett I like this idea. I'm working on it right now. Will ship shortly. Thanks!

rushmorem commented 7 years ago

@joshtriplett This is now possible in v1.0.3. Thanks!

rushmorem commented 7 years ago

@joshtriplett In case you are interested, the latest version, v1.1.0 has a List::from_reader method inspired by this issue. It means that if you use your own library to download the list you don't need to save it first. You can just pass the response object to that method.

joshtriplett commented 7 years ago

@rushmorem Thanks! I'd wondered about a from_data method, but from_reader seems even better.

rushmorem commented 7 years ago

@joshtriplett My pleasure :) I'm glad you like it.

rushmorem commented 7 years ago

@joshtriplett You may also be interested in the new psl crate that caches the list offline but also automatically downloads an updated one if the one you currently have is too old. By default, it downloads a new copy every week but you can change the duration to anything you want.