Closed hayesall closed 4 years ago
Is the idea here to make the creation of rnlp_log.log
optional, or to have the option to spit errors to the console?
I'd be in favor of either!
The primary time I've noticed errors is during first-time setup. But I've found logging convenient if I'm running this over multiple files, or embedding this as a library within another script.
A mix of both might also be an option: catching an ImportError
and throwing it to the console (since it's a severe error), then logging all other information.
@hayesall If you think running this with a --no-logs
option is helpful, that shouldn't be hard to do.
For me, when I run this without nltk
installed, I immediately crash with a ModuleNotFoundError: No module named 'nltk'
error, with no logs produced. What is the desired state?
I like the --no-logs
idea!
The ModuleNotFoundError
is not bad in this case: the error precisely describes the problem and can be solved by the user with pip install nltk
.
The severe problem comes next: when specific nltk
modules are missing. The rnlp Installation Guide currently solves this in documentation. If punkt
, stopwords
, and averaged_perceptron_tagger
are missing, the error is not relayed properly to the user.
@hayesall As for when nltk
modules are missing, there two options:
What's your preference?
I'm intrigued by the StackOverflow suggestion you linked.
A previous version of rnlp
used a custom install method during installation (see this version of setup.py
), but it was fragile in practice (broke with older versions of setuptools
and I recall a Windows-specific bug that came up).
parse.py
and textprocessing.py
rely on an nltk
package. A check could run before both of them to ensure the program state is valid.
Perhaps a check_state
module could be imported by parse.py
and textprocessing.py
. This module would perform something similar to the StackOverflow solution, and raise an exception if the check fails.
Sounds like there are two changes:
check_state()
which performs something similar to the SO solution, but if the download fails throws a custom Exception
specifying that RNLP cannot be used without package 'nltk', which was unable to be downloaded because of {e}
.parse.py
and text_processing.py
, first import and run check_state
, and then import nltk
which at that point will be in the environment.If that works, I can implement. Let me know what you want that Exception
to say on a failed install.
Sounds good to me! Thanks for your interest in this.
Let me know what you want that
Exception
to say on a failed install.
Explaining that the error builds on a missing nltk
package should be fine, perhaps:
Missing one or more nltk packages.
>>> import nltk
>>> nltk.download('punkt')
>>> nltk.download('stopwords')
>>> nltk.download('averaged_perceptron_tagger')
For more information, see the nltk data documentation: https://www.nltk.org/data.html
After looking at the comments in the SO post, seems as though it won't solve the problem to try to dynamically download the packages. I think our best solution might just be a custom exception if those packages aren't found, printing something like the above to the console. What do you think?
I think our best solution might just be a custom exception if those packages aren't found, printing something like the above to the console.
Agreed. Catching the LookupError
to raise something similar should be best. The nltk
errors are a little verbose for this setting so focusing on the three required packages should save some headaches.
@hayesall Updated the PR with desired functionality.
The current implementation sets logging from
__main__.py
and appends both errors and logs to a file namedrnlp_log.log
.If the user experiences errors when running the code (for example:
nltk
is not installed), these errors are placed in the log file and it may not be obvious what occurred.