Checklist

Please check the options that you have completed and strike-out the options that do not apply via this pull request:

[X] the pull request passes the tests (`./test-coverage "tests slow-tests"``) - this will also be visible via the Code coverage report and CI/CD task on the Pull Request
[x] you have performed some kind of smoke test by running your changes in an isolated environment i.e. Docker container, Google Colab, Kaggle, etc...
[x] the notebooks are updated (see notebooks folder, read the Notebooks docs)
[x] CHANGELOG.md has been updated (please follow the existing format)

Goal or purpose of the PR

Just like spelling check and grammar checks, adding a high-level feature to indicate if a block of text is easy to read or not, based on the library textstat's flesch_reading_ease().

It returns values between 0 and 100 (I have seen values go past 0 and 100 depending on how bad or good the text is).

What the Ease of Reading graphs of a typical text dataset could look like:

Changes implemented in the PR

Added a new layer just like any other high-level feature.
Tests added (unit, acceptance, and performance)
- existing test data files updated
- new benchmark for ease_to_reading_score added
constants file updated
requirement.txt updated

codecov[bot] commented 3 years ago

Codecov Report

Merging #59 (3d0ff42) into master (2091a58) will not change coverage. The diff coverage is 100.00%.

@@            Coverage Diff            @@
##            master       #59   +/-   ##
=========================================
  Coverage   100.00%   100.00%           
=========================================
  Files           22        23    +1     
  Lines          379       406   +27     
  Branches        54        57    +3     
=========================================
+ Hits           379       406   +27

Impacted Files	Coverage Δ
nlp_profiler/constants.py	`100.00% <100.00%> (ø)`
nlp_profiler/core.py	`100.00% <100.00%> (ø)`
...filer/high_level_features/ease_of_reading_check.py	`100.00% <100.00%> (ø)`

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update b3f9734...3d0ff42. Read the comment docs.

sourcery-ai[bot] commented 3 years ago

Sourcery Code Quality Report

❌ Merging this PR will decrease code quality in the affected files by 1.22%.

Quality metrics	Before	After	Change
Complexity	0.52 ⭐	0.48 ⭐	-0.04 👍
Method Length	48.81 ⭐	50.06 ⭐	1.25 👎
Working memory	8.55 🙂	9.06 🙂	0.51 👎
Quality	78.67% ⭐	77.45% ⭐	-1.22% 👎

Other metrics	Before	After	Change
Lines	303	338	35

Changed files	Quality Before	Quality After	Quality Change
setup.py	53.63% 🙂	53.63% 🙂	0.00%
nlp_profiler/constants.py	85.26% ⭐	83.69% ⭐	-1.57% 👎
nlp_profiler/core.py	51.83% 🙂	48.53% 😞	-3.30% 👎
slow-tests/acceptance_tests/test_apply_text_profiling.py	81.17% ⭐	78.72% ⭐	-2.45% 👎
tests/acceptance_tests/test_apply_text_profiling.py	84.73% ⭐	83.94% ⭐	-0.79% 👎
tests/high_level/test_grammar_check.py	86.49% ⭐	86.49% ⭐	0.00%

Here are some functions in these files that still need a tune-up:

File	Function	Complexity	Length	Working Memory	Quality	Recommendation
nlp_profiler/core.py	apply_text_profiling	4 ⭐	139 😞	19 ⛔	48.53% 😞	Try splitting into smaller methods. Extract out complex expressions
slow-tests/acceptance_tests/test_apply_text_profiling.py	test_given_a_text_column_when_profiler_is_applied_grammar_check_analysis_then_profiled_dataset_is_returned	0 ⭐	41 ⭐	12 😞	74.31% 🙂	Extract out complex expressions
slow-tests/acceptance_tests/test_apply_text_profiling.py	test_given_a_text_column_when_profiler_is_applied_ease_of_reading_check_analysis_then_profiled_dataset_is_returned	0 ⭐	41 ⭐	12 😞	74.31% 🙂	Extract out complex expressions
tests/acceptance_tests/test_apply_text_profiling.py	test_given_a_text_column_when_profiler_is_applied_with_then_all_options_disabled_then_no_profiled_dataset_is_returned	0 ⭐	34 ⭐	10 😞	79.09% ⭐	Extract out complex expressions

Legend and Explanation

The emojis denote the absolute quality of the code:

⭐ excellent
🙂 good
😞 poor
⛔ very poor

The 👍 and 👎 indicate whether the quality has improved or gotten worse with this pull request.

Please see our documentation here for details on how these metrics are calculated.

We are actively working on this report - lots more documentation and extra metrics to come!

Let us know what you think of it by mentioning @sourcery-ai in a comment.

neomatrix369 / nlp_profiler