Closed keyz182 closed 5 years ago
Hi Kieran, thank you for the pull request.
I would like to know your opinion about logging. When it has the ability to change the cache_dir should we warn user when the cache_dir is not accessible (around line 86: if not os.access(dir_path, os.W_OK):
).
I think you might be interested to see if the path that you provided is not writable. How do you see from users point of view?
While logging may have helped us find the issue a little sooner, the eventual error thrown (/home/wsgi/.urlextract_tlds not writeable
or similar) guided us directly to the issue pretty quickly. Though, from a good practice point-of-view, logging that the code has reached a condition where it needs to fall back may be the best approach.
For us, on Elastic Beanstalk, both the current directory, and the home directory were not writeable (~
resolved to /home/wsgi
which I don't think even existed. Would falling back to the temp directory be an option if the current dir is unreadable, rather than the home directory? Or make a temp directory the 3rd fallback?
The way we're using my branch currently is with the temp dir shown below, and is a cross platform compatible way of finding a temp directory.
import tempfile
extractor = URLExtract(cache_dir=tempfile.gettempdir())
I suppose you loose the cache then across reboots potentially, so may not be the most desirable, but as a fallback, may be acceptable.
Thank you for your time, opinions and ideas! I really appreciate it. It is really good idea with the temp dir so I've crated issue (#25) for it. If you have time you and are interested in helping with it just write comment to the issue otherwise so I know you (or somebody else) will do it or I will work on it when I have free time.
Thank you and have a nice day.
I'm using this library in an application deployed on AWS Elastic Beanstalk. In it's environment,
os.path.dirname(__file__)
would resolve to an unwriteable directory. I've added the optionalcache_dir
parameter to the constructor, defaulting to the value it was previously, to allow code to explicitly set a base cache directory.