Checklist

Please check the options that you have completed and strike-out the options that do not apply via this pull request:

[X] a clear title and description to the Pull Request has been provided
[X] you have read
- [X] the Contributing doc
- [X] the Developer Guide
[x] the pull request passes the tests (./test-coverage "tests slow-tests") - this will also be visible via the Code coverage report and CI/CD task on the Pull Request
[X] you have performed some kind of smoke test by running your changes in an isolated environment i.e. Docker container, Google Colab, Kaggle, etc... ~~- [ ] the notebooks are updated (see notebooks folder, read the Notebooks docs)~~
[x] CHANGELOG.md has been updated (please follow the existing format)

Goal or purpose of the PR

Minor fixes and code formatting

Changes implemented in the PR

Formatting all python code and fixing minor typos in the docs. Running black all across the code base and making the code structure consistent. Apply refactorings suggested by Sourcery.ai across all the source files.

sourcery-ai[bot] commented 1 year ago

Sourcery Code Quality Report

❌ Merging this PR will decrease code quality in the affected files by 3.04%.

Quality metrics	Before	After	Change
Complexity	0.89 ⭐	0.75 ⭐	-0.14 👍
Method Length	37.16 ⭐	39.41 ⭐	2.25 👎
Working memory	5.00 ⭐	5.96 ⭐	0.96 👎
Quality	87.20% ⭐	84.16% ⭐	-3.04% 👎

Other metrics	Before	After	Change
Lines	1997	2483	486

Changed files	Quality Before	Quality After	Quality Change
setup.py	67.46% 🙂	67.46% 🙂	0.00%
nlp_profiler/__init__.py	100.00% ⭐	100.00% ⭐	0.00%
nlp_profiler/constants.py	80.17% ⭐	80.17% ⭐	0.00%
nlp_profiler/core.py	63.10% 🙂	64.79% 🙂	1.69% 👍
nlp_profiler/generate_features/__init__.py	71.82% 🙂	72.44% 🙂	0.62% 👍
nlp_profiler/generate_features/parallelisation_methods/__init__.py	90.29% ⭐	91.42% ⭐	1.13% 👍
nlp_profiler/granular_features/__init__.py	75.47% ⭐	75.47% ⭐	0.00%
nlp_profiler/granular_features/alphanumeric.py	97.09% ⭐	94.76% ⭐	-2.33% 👎
nlp_profiler/granular_features/chars_spaces_and_whitespaces.py	94.66% ⭐	91.52% ⭐	-3.14% 👎
nlp_profiler/granular_features/dates.py	90.18% ⭐	88.11% ⭐	-2.07% 👎
nlp_profiler/granular_features/emojis.py	93.69% ⭐	93.93% ⭐	0.24% 👍
nlp_profiler/granular_features/english_non_english_chars.py	94.86% ⭐	90.69% ⭐	-4.17% 👎
nlp_profiler/granular_features/letters.py	97.09% ⭐	94.76% ⭐	-2.33% 👎
nlp_profiler/granular_features/non_alphanumeric.py	97.09% ⭐	94.76% ⭐	-2.33% 👎
nlp_profiler/granular_features/noun_phrase_count.py	87.32% ⭐	85.95% ⭐	-1.37% 👎
nlp_profiler/granular_features/numbers.py	97.09% ⭐	94.76% ⭐	-2.33% 👎
nlp_profiler/granular_features/punctuations.py	90.93% ⭐	88.44% ⭐	-2.49% 👎
nlp_profiler/granular_features/stop_words.py	93.13% ⭐	93.52% ⭐	0.39% 👍
nlp_profiler/granular_features/words.py	97.09% ⭐	94.76% ⭐	-2.33% 👎
nlp_profiler/high_level_features/__init__.py	85.89% ⭐	85.89% ⭐	0.00%
nlp_profiler/high_level_features/ease_of_reading_check.py	85.73% ⭐	86.56% ⭐	0.83% 👍
nlp_profiler/high_level_features/sentiment_polarity.py	86.78% ⭐	87.66% ⭐	0.88% 👍
nlp_profiler/high_level_features/sentiment_subjectivity.py	86.78% ⭐	87.92% ⭐	1.14% 👍
slow-tests/acceptance_tests/test_apply_text_profiling.py	89.13% ⭐	89.13% ⭐	0.00%
slow-tests/performance_tests/test_perf_ease_of_reading_check.py	99.17% ⭐	99.17% ⭐	0.00%
slow-tests/performance_tests/test_perf_grammar_check.py	99.17% ⭐	99.17% ⭐	0.00%
slow-tests/performance_tests/test_perf_granular_features.py	98.83% ⭐	98.83% ⭐	0.00%
slow-tests/performance_tests/test_perf_noun_phrase.py	99.17% ⭐	99.17% ⭐	0.00%
slow-tests/performance_tests/test_perf_spelling_check.py	99.17% ⭐	99.17% ⭐	0.00%
tests/common_functions.py	72.84% 🙂	72.84% 🙂	0.00%
tests/acceptance_tests/test_apply_text_profiling.py	88.45% ⭐	88.45% ⭐	0.00%
tests/granular/test_alphanumeric.py	94.51% ⭐	94.11% ⭐	-0.40% 👎
tests/granular/test_chars_and_spaces.py	80.81% ⭐	80.81% ⭐	0.00%
tests/granular/test_dates.py	94.84% ⭐	94.26% ⭐	-0.58% 👎
tests/granular/test_duplicates.py	95.65% ⭐	94.85% ⭐	-0.80% 👎
tests/granular/test_emojis.py	95.02% ⭐	94.50% ⭐	-0.52% 👎
tests/granular/test_english_non_english_characters.py	90.40% ⭐	70.81% 🙂	-19.59% 👎
tests/granular/test_non_alphanumeric.py	94.10% ⭐	93.24% ⭐	-0.86% 👎
tests/granular/test_nounphrase.py	%	90.76% ⭐	%
tests/granular/test_numbers.py	85.25% ⭐	85.20% ⭐	-0.05% 👎
tests/granular/test_punctuations.py	90.73% ⭐	89.94% ⭐	-0.79% 👎
tests/granular/test_repeated_digits.py	90.40% ⭐	77.69% ⭐	-12.71% 👎
tests/granular/test_repeated_letters.py	90.40% ⭐	79.93% ⭐	-10.47% 👎
tests/granular/test_repeated_punctuations.py	90.40% ⭐	70.55% 🙂	-19.85% 👎
tests/granular/test_sentences.py	89.16% ⭐	89.02% ⭐	-0.14% 👎
tests/granular/test_stop_words.py	95.02% ⭐	94.50% ⭐	-0.52% 👎
tests/granular/test_syllables.py	90.40% ⭐	74.74% 🙂	-15.66% 👎
tests/granular/test_white_spaces.py	80.81% ⭐	80.81% ⭐	0.00%
tests/granular/test_words.py	94.86% ⭐	94.37% ⭐	-0.49% 👎
tests/high_level/test_ease_of_reading_check.py	87.93% ⭐	70.29% 🙂	-17.64% 👎
tests/high_level/test_grammar_check.py	87.91% ⭐	87.04% ⭐	-0.87% 👎
tests/high_level/test_sentiment_polarity.py	79.18% ⭐	79.18% ⭐	0.00%
tests/high_level/test_sentiment_subjectivity.py	79.18% ⭐	79.18% ⭐	0.00%
tests/high_level/test_spelling_check.py	74.59% 🙂	74.59% 🙂	0.00%

Here are some functions in these files that still need a tune-up:

File	Function	Complexity	Length	Working Memory	Quality	Recommendation
tests/common_functions.py	internal_assert_benchmark	1 ⭐	136 😞	13 😞	58.26% 🙂	Try splitting into smaller methods. Extract out complex expressions
tests/common_functions.py	generate_data	0 ⭐	80 🙂	16 ⛔	62.86% 🙂	Extract out complex expressions
nlp_profiler/core.py	apply_text_profiling	5 ⭐	148 😞	7 🙂	64.79% 🙂	Try splitting into smaller methods
nlp_profiler/generate_features/__init__.py	generate_features	2 ⭐	63 🙂	10 😞	72.44% 🙂	Extract out complex expressions
nlp_profiler/granular_features/__init__.py	apply_granular_features	0 ⭐	120 😞	6 ⭐	75.47% ⭐	Try splitting into smaller methods

Legend and Explanation

The emojis denote the absolute quality of the code:

⭐ excellent
🙂 good
😞 poor
⛔ very poor

The 👍 and 👎 indicate whether the quality has improved or gotten worse with this pull request.

Please see our documentation here for details on how these metrics are calculated.

We are actively working on this report - lots more documentation and extra metrics to come!

Help us improve this quality report!

codecov[bot] commented 1 year ago

Codecov Report

Patch coverage: 100.00% and no project coverage change.

Comparison is base (a3538c6) 100.00% compared to head (7caeb47) 100.00%.

:exclamation: Current head 7caeb47 differs from pull request most recent head def1ee8. Consider uploading reports for the commit def1ee8 to get more accurate results

Additional details and impacted files

```diff @@ Coverage Diff @@ ## master #73 +/- ## ========================================= Coverage 100.00% 100.00% ========================================= Files 26 26 Lines 498 439 -59 Branches 74 45 -29 ========================================= - Hits 498 439 -59 ``` | [Impacted Files](https://codecov.io/gh/neomatrix369/nlp_profiler/pull/73?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=None) | Coverage Δ | | |---|---|---| | [nlp\_profiler/constants.py](https://codecov.io/gh/neomatrix369/nlp_profiler/pull/73?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=None#diff-bmxwX3Byb2ZpbGVyL2NvbnN0YW50cy5weQ==) | `100.00% <100.00%> (ø)` | | | [nlp\_profiler/core.py](https://codecov.io/gh/neomatrix369/nlp_profiler/pull/73?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=None#diff-bmxwX3Byb2ZpbGVyL2NvcmUucHk=) | `100.00% <100.00%> (ø)` | | | [nlp\_profiler/generate\_features/\_\_init\_\_.py](https://codecov.io/gh/neomatrix369/nlp_profiler/pull/73?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=None#diff-bmxwX3Byb2ZpbGVyL2dlbmVyYXRlX2ZlYXR1cmVzL19faW5pdF9fLnB5) | `100.00% <100.00%> (ø)` | | | [...erate\_features/parallelisation\_methods/\_\_init\_\_.py](https://codecov.io/gh/neomatrix369/nlp_profiler/pull/73?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=None#diff-bmxwX3Byb2ZpbGVyL2dlbmVyYXRlX2ZlYXR1cmVzL3BhcmFsbGVsaXNhdGlvbl9tZXRob2RzL19faW5pdF9fLnB5) | `100.00% <100.00%> (ø)` | | | [nlp\_profiler/granular\_features/\_\_init\_\_.py](https://codecov.io/gh/neomatrix369/nlp_profiler/pull/73?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=None#diff-bmxwX3Byb2ZpbGVyL2dyYW51bGFyX2ZlYXR1cmVzL19faW5pdF9fLnB5) | `100.00% <100.00%> (ø)` | | | [nlp\_profiler/granular\_features/alphanumeric.py](https://codecov.io/gh/neomatrix369/nlp_profiler/pull/73?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=None#diff-bmxwX3Byb2ZpbGVyL2dyYW51bGFyX2ZlYXR1cmVzL2FscGhhbnVtZXJpYy5weQ==) | `100.00% <100.00%> (ø)` | | | [.../granular\_features/chars\_spaces\_and\_whitespaces.py](https://codecov.io/gh/neomatrix369/nlp_profiler/pull/73?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=None#diff-bmxwX3Byb2ZpbGVyL2dyYW51bGFyX2ZlYXR1cmVzL2NoYXJzX3NwYWNlc19hbmRfd2hpdGVzcGFjZXMucHk=) | `100.00% <100.00%> (ø)` | | | [nlp\_profiler/granular\_features/dates.py](https://codecov.io/gh/neomatrix369/nlp_profiler/pull/73?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=None#diff-bmxwX3Byb2ZpbGVyL2dyYW51bGFyX2ZlYXR1cmVzL2RhdGVzLnB5) | `100.00% <100.00%> (ø)` | | | [nlp\_profiler/granular\_features/emojis.py](https://codecov.io/gh/neomatrix369/nlp_profiler/pull/73?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=None#diff-bmxwX3Byb2ZpbGVyL2dyYW51bGFyX2ZlYXR1cmVzL2Vtb2ppcy5weQ==) | `100.00% <100.00%> (ø)` | | | [...ler/granular\_features/english\_non\_english\_chars.py](https://codecov.io/gh/neomatrix369/nlp_profiler/pull/73?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=None#diff-bmxwX3Byb2ZpbGVyL2dyYW51bGFyX2ZlYXR1cmVzL2VuZ2xpc2hfbm9uX2VuZ2xpc2hfY2hhcnMucHk=) | `100.00% <100.00%> (ø)` | | | ... and [11 more](https://codecov.io/gh/neomatrix369/nlp_profiler/pull/73?src=pr&el=tree-more&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=None) | | Help us with your feedback. Take ten seconds to tell us [how you rate us](https://about.codecov.io/nps?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=None). Have a feature suggestion? [Share it here.](https://app.codecov.io/gh/feedback/?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=None)

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Do you have feedback about the report comment? Let us know in this issue.

neomatrix369 commented 1 year ago

Currently blocked by Windows Unicode error which are failing the Windows runners, as per https://github.com/neomatrix369/nlp_profiler/actions/runs/4401367033/jobs/7707511237

neomatrix369 commented 1 year ago

https://github.com/neomatrix369/nlp_profiler/pull/73#issuecomment-1465494680 is fixed by 378458f, 44bcc4, 1771150

neomatrix369 commented 1 year ago

Pending: sourcery refactoring fixes to merge this PR and other checks mentioned in the body/description of the PR

neomatrix369 commented 1 year ago

https://github.com/neomatrix369/nlp_profiler/pull/73#issuecomment-1465516959 - now resolved, next manual checks of notebooks/smoke tests

neomatrix369 commented 1 year ago

Logged a regression issue #78 on the back of reviewing the notebooks

neomatrix369 / nlp_profiler

Refactor: reformatting python code across all the source files #73

Checklist

Goal or purpose of the PR

Changes implemented in the PR

Sourcery Code Quality Report

Legend and Explanation

Codecov Report