spekulatius / PHPScraper

A universal web-util for PHP.
https://phpscraper.de
GNU General Public License v3.0
520 stars 73 forks source link

Bump jeremykendall/php-domain-parser to version 6 #79

Closed tacman closed 1 year ago

tacman commented 2 years ago

I'm trying to integrate this into an app that is already locked to psr/log:^3.

jeremykendall/php-domain-parser version 5 is locked to psr/log:^1

There are 2 tests that fail when bumping to the new version. I can fix and submit a PR.

tacman commented 2 years ago

Sigh. Version 6 no longer including a manager, so using this requires a cache and mechanism for fetching and storing the rules.

It's not too difficult, but it's not a trivial syntax change either (though there are some namespace changes).

Thoughts on how to proceed?

spekulatius commented 2 years ago

Hey @tacman

yeah, I remember there were some structural changes in the package. Do you think you can get it done? Namespaces can usually be replaced easily most of the time.

Cheers, Peter

tacman commented 2 years ago

OK. There are 2 approaches. The easiest is to download the rules file, add it to the repo, and then load it. Of course, the rules will become stale.

The better approach require a dependency on a cache. Then we can fetch the rules like this, which will update the rules every 24 hours:

    public function getTldCollection(): Rules
    {
        $cache = new FilesystemAdapter(); // or some other cache.

        $rules = $cache->get('pdp_rules', function (ItemInterface $item) {
            // The callable will only be executed on a cache miss.
            $item->expiresAfter(3600 * 24);
            $response = $this->client->request(
                'GET',
                PsrStorageFactory::PUBLIC_SUFFIX_LIST_URI
            );
            return $response->getContent();
        });

        $publicSuffixList = Rules::fromString($rules);
        return $publicSuffixList;

I'll go ahead and implement this to make it functional, but I'm not sure how to code it so that the user can inject whatever cache they already have in their application.

tacman commented 2 years ago

Since you've tagged this as a new version, can we also bump to PHP8?

tacman commented 2 years ago

Also, I'd like to migrate this to psr-4, and separate the classes into their own files. Or perhaps you should do that, it's likely a BC.

tacman commented 2 years ago

I started down the rabbit hole...

If phpscraper needs a cache for the domain parse, a CacheInterface cache should probably be injected. But that means the phpscaper should itself be a service that's injected, rather than called with new phpscaper().

Alternatively, we can add a CacheAwareInterface, and add the cache via a method call.

Alas, I'm not as expert in this as I'd like to be!

spekulatius commented 2 years ago

The question of the cache was stopped me too. I was actually thinking of storing a file/set of files somewhere to avoid handling the questions of integration, especially with simple VanillaPHP projects (where PHPscraper comes in handy for me most).

spekulatius commented 2 years ago

Hey @tacman,

Have you made progress implementing a cache? I've seen this commit and was wondering if we can get a framework agnostic-solution working. I'd still try to avoid injecting a CacheInterface as it is framework dependent. Happy to hear your thoughts!

Cheers, Peter

spekulatius commented 2 years ago

Alternatively either spatie/url or thephpleague/uri could replace jeremykendall/php-domain-parser. I still need to confirm if the libs are suitable for the job tho.

spekulatius commented 1 year ago

For now we are using league/uri for URL processing, with this the subdomain-specific filtering has been dropped.