jeremykendall / php-domain-parser

Public Suffix List based domain parsing implemented in PHP
MIT License
1.16k stars 128 forks source link

Cache throws when cache directory doesn't exists #280

Closed nicolasmilville closed 4 years ago

nicolasmilville commented 4 years ago

Issue summary

Since 5.7 version we encounter new issue with Cache class. In 5.6 version, data directory was filled with two data files. In new version of package, this directory only contains empty .gitignore file.

This change can impact users who are hosted on AWS and use CodeDeploy. CodeDeploy has an --ignore-hidden-files flag which is very usefull to reduce deployment artifacts size... In my case, deployment artifact size is multiplied by 8 without this flag. See https://docs.aws.amazon.com/cli/latest/reference/deploy/push.html for more informations.

With this flag, data file won't be created and leads to a TypeError when using Cache. See exception:

Uncaught exception 'TypeError' with message 'file_exists() expects parameter 1 to be a valid path, boolean given' in [...]/vendor/jeremykendall/php-domain-parser/src/Cache.php:89

Proposal

We can maybe test if data directory exists and if it's not the case, create it to avoid this exception ?

System informations

Information Description
Pdp version 5.7
PHP version 7.2.22
OS Platform Debian 9

Expected result

No exception

Actual result

Uncaught exception 'TypeError' with message 'file_exists() expects parameter 1 to be a valid path, boolean given' in [...]/vendor/jeremykendall/php-domain-parser/src/Cache.php:89
nyamsprod commented 4 years ago

@nicolasmilville thanks for using the library your issue raises a couple of questions that may lead or not in more complex resolution.

First and foremost even if it is not stressed enough in the documentation the cache system shipped with this package is optional . And I'll be honest I'm thinking of removing it from the package and create an optional package just for caching both external databases.

Second, the reason the directory was made empty in 5.7 was to force users to either create their own cache strategy but also to avoid the unnecessary security fix to only update the cache when a composer call is made. As a matter of fact I now think that tying the local cache update to composer lifecycle was a bad idea.

Last but not least currently I believe the current Cache class does a check on the directory presence and tries to create it if it does not exists see https://github.com/jeremykendall/php-domain-parser/blob/develop/src/Cache.php#L82-L98

So we all that being said I think your real issue "no pun intended" is that in your settings the PHP function realpath returns false on line https://github.com/jeremykendall/php-domain-parser/blob/develop/src/Cache.php#L86 because the Cache file expect to work on a real filesystem not on a virtual one like the one provided by AWS.

I would suggest you implement an alternative caching system like explain in the documentation https://github.com/jeremykendall/php-domain-parser#alternatives-to-the-update-psl-script or build one of your own.

nicolasmilville commented 4 years ago

First and foremost even if it is not stressed enough in the documentation the cache system shipped with this package is optional . And I'll be honest I'm thinking of removing it from the package and create an optional package just for caching both external databases.

Yes I saw this and have already fixed this issue on my side by replacing your optional cache system by Symfony FileSystem simple cache implementation ;) I just reported this for potential peoples affected by same issue.

So we all that being said I think your real issue "no pun intended" is that in your settings the PHP function realpath returns false on line https://github.com/jeremykendall/php-domain-parser/blob/develop/src/Cache.php#L86 because the Cache file expect to work on a real filesystem not on a virtual one like the one provided by AWS.

Hum, are you sure about this ? When reading manual of realpath function, we can see "realpath() returns FALSE on failure, e.g. if the file does not exist.". I think it's simple, in this case the file/directory doesn't exists so... it returns false as expected.

Maybe you can transform :

if ('' === $cache_path) {
    /** @var string $cache_path */
    $cache_path = realpath(dirname(__DIR__).DIRECTORY_SEPARATOR.'data');
}

if (!file_exists($cache_path) && file_exists(dirname($cache_path))) {
    $this->mkdir($cache_path); // ensure that the parent path exists
}

to :

if ('' === $cache_path) {
    /** @var string $root_path */
    $root_path = realpath(dirname(__DIR__));
    /** @var string $cache_path */
    $cache_path = $root_path.DIRECTORY_SEPARATOR.'data';
}

if (!file_exists($cache_path) && file_exists(dirname($cache_path))) {
    $this->mkdir($cache_path); // ensure that the parent path exists
}

In this case, realpath will never return false because root directory of your library always exists.

Thanks for answer !

nyamsprod commented 4 years ago

@nicolasmilville indeed that could be a quick/easy fix. I'll try to update that line and release a 5.7.1

nyamsprod commented 4 years ago

I'll close the issue as the patch has landed on the develop branch. Expect a new release later today