Open anjackson opened 10 years ago
I'm not seeing what #12 has to do with this?
My mistake
I had a look at Guava. I assume that it is com.google.common.net.InternetDomainName
@anjackson has in mind. As far as I can see this class is not a real replacement for org.archive.net.PublicSuffixes
. The latter is looking up real world suffixes from https://publicsuffix.org/
while the former is just evaluating patterns and have no knowledge if the domain names are real.
Ah, so, I was going on the basis of this documentation that indicated that the Guava classes also just used the publicsuffix.org list. Maybe that documentation is out of date?
My fault. You are right I didn't read the code well enough.
Regarding which implementation is best maintained, both implementations uses a local copy of the list from publicsuffix.org (PSL). Webarchive-commons' list was last updated sometime late in 2013, while Guava master branch was updated August 20, 2014. In both cases freshness is dependent on the release frequency of the library and that we always depend on the latest version.
For freshness, I think moving to Guava is good. But if we are using other part of Guava and that part gets API-changes, then we must update our code just to get the updated PSL. This is probably not a big problem assuming Guava is concerned about backward compatibility.
I've found that Heritrix is already using com.google.common.net.InternetDomainName
. I think we should do the move as well.
While porting for #1, this happened:
This is rather clumsy, and given this is provided by Google Guava, there seems little point maintaining our own code (assuming theirs is kept up to date). The task is then to check that the Google one is well maintained and switch over to that instead of copying in code from elsewhere.