Logging should be optional

hayesall commented 6 years ago

The current implementation sets logging from __main__.py and appends both errors and logs to a file named rnlp_log.log.

If the user experiences errors when running the code (for example: nltk is not installed), these errors are placed in the log file and it may not be obvious what occurred.

adamjsawicki commented 4 years ago

Is the idea here to make the creation of rnlp_log.log optional, or to have the option to spit errors to the console?

hayesall commented 4 years ago

I'd be in favor of either!

The primary time I've noticed errors is during first-time setup. But I've found logging convenient if I'm running this over multiple files, or embedding this as a library within another script.

A mix of both might also be an option: catching an ImportError and throwing it to the console (since it's a severe error), then logging all other information.

adamjsawicki commented 4 years ago

@hayesall If you think running this with a --no-logs option is helpful, that shouldn't be hard to do.

For me, when I run this without nltk installed, I immediately crash with a ModuleNotFoundError: No module named 'nltk' error, with no logs produced. What is the desired state?

hayesall commented 4 years ago

I like the --no-logs idea!

The ModuleNotFoundError is not bad in this case: the error precisely describes the problem and can be solved by the user with pip install nltk.

The severe problem comes next: when specific nltk modules are missing. The rnlp Installation Guide currently solves this in documentation. If punkt, stopwords, and averaged_perceptron_tagger are missing, the error is not relayed properly to the user.

adamjsawicki commented 4 years ago

@hayesall As for when nltk modules are missing, there two options:

Throw an error that the user sees
Try to download them, like the solution posted here

What's your preference?

hayesall commented 4 years ago

I'm intrigued by the StackOverflow suggestion you linked.

A previous version of rnlp used a custom install method during installation (see this version of setup.py), but it was fragile in practice (broke with older versions of setuptools and I recall a Windows-specific bug that came up).

parse.py and textprocessing.py rely on an nltk package. A check could run before both of them to ensure the program state is valid.

Perhaps a check_state module could be imported by parse.py and textprocessing.py. This module would perform something similar to the StackOverflow solution, and raise an exception if the check fails.

adamjsawicki commented 4 years ago

Sounds like there are two changes:

Create check_state() which performs something similar to the SO solution, but if the download fails throws a custom Exception specifying that RNLP cannot be used without package 'nltk', which was unable to be downloaded because of {e}.
In parse.py and text_processing.py, first import and run check_state, and then import nltk which at that point will be in the environment.

If that works, I can implement. Let me know what you want that Exception to say on a failed install.

hayesall commented 4 years ago

Sounds good to me! Thanks for your interest in this.

Let me know what you want that Exception to say on a failed install.

Explaining that the error builds on a missing nltk package should be fine, perhaps:

Missing one or more nltk packages.

>>> import nltk
>>> nltk.download('punkt')
>>> nltk.download('stopwords')
>>> nltk.download('averaged_perceptron_tagger')

For more information, see the nltk data documentation: https://www.nltk.org/data.html

adamjsawicki commented 4 years ago

After looking at the comments in the SO post, seems as though it won't solve the problem to try to dynamically download the packages. I think our best solution might just be a custom exception if those packages aren't found, printing something like the above to the console. What do you think?

hayesall commented 4 years ago

I think our best solution might just be a custom exception if those packages aren't found, printing something like the above to the console.

Agreed. Catching the LookupError to raise something similar should be best. The nltk errors are a little verbose for this setting so focusing on the three required packages should save some headaches.

nltk stopwords LookupError console log

```python >>> import nltk >>> from nltk.corpus import stopwords >>> stopwords.words("english") Traceback (most recent call last): File "/Users/hayesall/miniconda3/envs/rnlp/lib/python3.7/site-packages/nltk/corpus/util.py", line 86, in __load root = nltk.data.find('{}/{}'.format(self.subdir, zip_name)) File "/Users/hayesall/miniconda3/envs/rnlp/lib/python3.7/site-packages/nltk/data.py", line 701, in find raise LookupError(resource_not_found) LookupError: ********************************************************************** Resource stopwords not found. Please use the NLTK Downloader to obtain the resource: >>> import nltk >>> nltk.download('stopwords') For more information see: https://www.nltk.org/data.html Attempted to load corpora/stopwords.zip/stopwords/ Searched in: - '/Users/hayesall/nltk_data' - '/Users/hayesall/miniconda3/envs/rnlp/nltk_data' - '/Users/hayesall/miniconda3/envs/rnlp/share/nltk_data' - '/Users/hayesall/miniconda3/envs/rnlp/lib/nltk_data' - '/usr/share/nltk_data' - '/usr/local/share/nltk_data' - '/usr/lib/nltk_data' - '/usr/local/lib/nltk_data' ********************************************************************** During handling of the above exception, another exception occurred: Traceback (most recent call last): File "", line 1, in File "/Users/hayesall/miniconda3/envs/rnlp/lib/python3.7/site-packages/nltk/corpus/util.py", line 123, in __getattr__ self.__load() File "/Users/hayesall/miniconda3/envs/rnlp/lib/python3.7/site-packages/nltk/corpus/util.py", line 88, in __load raise e File "/Users/hayesall/miniconda3/envs/rnlp/lib/python3.7/site-packages/nltk/corpus/util.py", line 83, in __load root = nltk.data.find('{}/{}'.format(self.subdir, self.__name)) File "/Users/hayesall/miniconda3/envs/rnlp/lib/python3.7/site-packages/nltk/data.py", line 701, in find raise LookupError(resource_not_found) LookupError: ********************************************************************** Resource stopwords not found. Please use the NLTK Downloader to obtain the resource: >>> import nltk >>> nltk.download('stopwords') For more information see: https://www.nltk.org/data.html Attempted to load corpora/stopwords Searched in: - '/Users/hayesall/nltk_data' - '/Users/hayesall/miniconda3/envs/rnlp/nltk_data' - '/Users/hayesall/miniconda3/envs/rnlp/share/nltk_data' - '/Users/hayesall/miniconda3/envs/rnlp/lib/nltk_data' - '/usr/share/nltk_data' - '/usr/local/share/nltk_data' - '/usr/lib/nltk_data' - '/usr/local/lib/nltk_data' ********************************************************************** ```

adamjsawicki commented 4 years ago

@hayesall Updated the PR with desired functionality.

srlearn / rnlp

Logging should be optional #8