neomatrix369 / nlp_profiler

A simple NLP library allows profiling datasets with one or more text columns. When given a dataset and a column name containing text data, NLP Profiler will return either high-level insights or low-level/granular statistical information about the text in that column.
Other
241 stars 37 forks source link

Refactor: reformatting python code across all the source files #73

Closed neomatrix369 closed 1 year ago

neomatrix369 commented 1 year ago

To be able to merge a pull request, there are a few checks:

Checklist

Please check the options that you have completed and strike-out the options that do not apply via this pull request:

Goal or purpose of the PR

Minor fixes and code formatting

Changes implemented in the PR

Formatting all python code and fixing minor typos in the docs. Running black all across the code base and making the code structure consistent. Apply refactorings suggested by Sourcery.ai across all the source files.

sourcery-ai[bot] commented 1 year ago

Sourcery Code Quality Report

❌  Merging this PR will decrease code quality in the affected files by 3.04%.

Quality metrics Before After Change
Complexity 0.89 ⭐ 0.75 ⭐ -0.14 πŸ‘
Method Length 37.16 ⭐ 39.41 ⭐ 2.25 πŸ‘Ž
Working memory 5.00 ⭐ 5.96 ⭐ 0.96 πŸ‘Ž
Quality 87.20% ⭐ 84.16% ⭐ -3.04% πŸ‘Ž
Other metrics Before After Change
Lines 1997 2483 486
Changed files Quality Before Quality After Quality Change
setup.py 67.46% πŸ™‚ 67.46% πŸ™‚ 0.00%
nlp_profiler/__init__.py 100.00% ⭐ 100.00% ⭐ 0.00%
nlp_profiler/constants.py 80.17% ⭐ 80.17% ⭐ 0.00%
nlp_profiler/core.py 63.10% πŸ™‚ 64.79% πŸ™‚ 1.69% πŸ‘
nlp_profiler/generate_features/__init__.py 71.82% πŸ™‚ 72.44% πŸ™‚ 0.62% πŸ‘
nlp_profiler/generate_features/parallelisation_methods/__init__.py 90.29% ⭐ 91.42% ⭐ 1.13% πŸ‘
nlp_profiler/granular_features/__init__.py 75.47% ⭐ 75.47% ⭐ 0.00%
nlp_profiler/granular_features/alphanumeric.py 97.09% ⭐ 94.76% ⭐ -2.33% πŸ‘Ž
nlp_profiler/granular_features/chars_spaces_and_whitespaces.py 94.66% ⭐ 91.52% ⭐ -3.14% πŸ‘Ž
nlp_profiler/granular_features/dates.py 90.18% ⭐ 88.11% ⭐ -2.07% πŸ‘Ž
nlp_profiler/granular_features/emojis.py 93.69% ⭐ 93.93% ⭐ 0.24% πŸ‘
nlp_profiler/granular_features/english_non_english_chars.py 94.86% ⭐ 90.69% ⭐ -4.17% πŸ‘Ž
nlp_profiler/granular_features/letters.py 97.09% ⭐ 94.76% ⭐ -2.33% πŸ‘Ž
nlp_profiler/granular_features/non_alphanumeric.py 97.09% ⭐ 94.76% ⭐ -2.33% πŸ‘Ž
nlp_profiler/granular_features/noun_phrase_count.py 87.32% ⭐ 85.95% ⭐ -1.37% πŸ‘Ž
nlp_profiler/granular_features/numbers.py 97.09% ⭐ 94.76% ⭐ -2.33% πŸ‘Ž
nlp_profiler/granular_features/punctuations.py 90.93% ⭐ 88.44% ⭐ -2.49% πŸ‘Ž
nlp_profiler/granular_features/stop_words.py 93.13% ⭐ 93.52% ⭐ 0.39% πŸ‘
nlp_profiler/granular_features/words.py 97.09% ⭐ 94.76% ⭐ -2.33% πŸ‘Ž
nlp_profiler/high_level_features/__init__.py 85.89% ⭐ 85.89% ⭐ 0.00%
nlp_profiler/high_level_features/ease_of_reading_check.py 85.73% ⭐ 86.56% ⭐ 0.83% πŸ‘
nlp_profiler/high_level_features/sentiment_polarity.py 86.78% ⭐ 87.66% ⭐ 0.88% πŸ‘
nlp_profiler/high_level_features/sentiment_subjectivity.py 86.78% ⭐ 87.92% ⭐ 1.14% πŸ‘
slow-tests/acceptance_tests/test_apply_text_profiling.py 89.13% ⭐ 89.13% ⭐ 0.00%
slow-tests/performance_tests/test_perf_ease_of_reading_check.py 99.17% ⭐ 99.17% ⭐ 0.00%
slow-tests/performance_tests/test_perf_grammar_check.py 99.17% ⭐ 99.17% ⭐ 0.00%
slow-tests/performance_tests/test_perf_granular_features.py 98.83% ⭐ 98.83% ⭐ 0.00%
slow-tests/performance_tests/test_perf_noun_phrase.py 99.17% ⭐ 99.17% ⭐ 0.00%
slow-tests/performance_tests/test_perf_spelling_check.py 99.17% ⭐ 99.17% ⭐ 0.00%
tests/common_functions.py 72.84% πŸ™‚ 72.84% πŸ™‚ 0.00%
tests/acceptance_tests/test_apply_text_profiling.py 88.45% ⭐ 88.45% ⭐ 0.00%
tests/granular/test_alphanumeric.py 94.51% ⭐ 94.11% ⭐ -0.40% πŸ‘Ž
tests/granular/test_chars_and_spaces.py 80.81% ⭐ 80.81% ⭐ 0.00%
tests/granular/test_dates.py 94.84% ⭐ 94.26% ⭐ -0.58% πŸ‘Ž
tests/granular/test_duplicates.py 95.65% ⭐ 94.85% ⭐ -0.80% πŸ‘Ž
tests/granular/test_emojis.py 95.02% ⭐ 94.50% ⭐ -0.52% πŸ‘Ž
tests/granular/test_english_non_english_characters.py 90.40% ⭐ 70.81% πŸ™‚ -19.59% πŸ‘Ž
tests/granular/test_non_alphanumeric.py 94.10% ⭐ 93.24% ⭐ -0.86% πŸ‘Ž
tests/granular/test_nounphrase.py % 90.76% ⭐ %
tests/granular/test_numbers.py 85.25% ⭐ 85.20% ⭐ -0.05% πŸ‘Ž
tests/granular/test_punctuations.py 90.73% ⭐ 89.94% ⭐ -0.79% πŸ‘Ž
tests/granular/test_repeated_digits.py 90.40% ⭐ 77.69% ⭐ -12.71% πŸ‘Ž
tests/granular/test_repeated_letters.py 90.40% ⭐ 79.93% ⭐ -10.47% πŸ‘Ž
tests/granular/test_repeated_punctuations.py 90.40% ⭐ 70.55% πŸ™‚ -19.85% πŸ‘Ž
tests/granular/test_sentences.py 89.16% ⭐ 89.02% ⭐ -0.14% πŸ‘Ž
tests/granular/test_stop_words.py 95.02% ⭐ 94.50% ⭐ -0.52% πŸ‘Ž
tests/granular/test_syllables.py 90.40% ⭐ 74.74% πŸ™‚ -15.66% πŸ‘Ž
tests/granular/test_white_spaces.py 80.81% ⭐ 80.81% ⭐ 0.00%
tests/granular/test_words.py 94.86% ⭐ 94.37% ⭐ -0.49% πŸ‘Ž
tests/high_level/test_ease_of_reading_check.py 87.93% ⭐ 70.29% πŸ™‚ -17.64% πŸ‘Ž
tests/high_level/test_grammar_check.py 87.91% ⭐ 87.04% ⭐ -0.87% πŸ‘Ž
tests/high_level/test_sentiment_polarity.py 79.18% ⭐ 79.18% ⭐ 0.00%
tests/high_level/test_sentiment_subjectivity.py 79.18% ⭐ 79.18% ⭐ 0.00%
tests/high_level/test_spelling_check.py 74.59% πŸ™‚ 74.59% πŸ™‚ 0.00%

Here are some functions in these files that still need a tune-up:

File Function Complexity Length Working Memory Quality Recommendation
tests/common_functions.py internal_assert_benchmark 1 ⭐ 136 😞 13 😞 58.26% πŸ™‚ Try splitting into smaller methods. Extract out complex expressions
tests/common_functions.py generate_data 0 ⭐ 80 πŸ™‚ 16 β›” 62.86% πŸ™‚ Extract out complex expressions
nlp_profiler/core.py apply_text_profiling 5 ⭐ 148 😞 7 πŸ™‚ 64.79% πŸ™‚ Try splitting into smaller methods
nlp_profiler/generate_features/__init__.py generate_features 2 ⭐ 63 πŸ™‚ 10 😞 72.44% πŸ™‚ Extract out complex expressions
nlp_profiler/granular_features/__init__.py apply_granular_features 0 ⭐ 120 😞 6 ⭐ 75.47% ⭐ Try splitting into smaller methods

Legend and Explanation

The emojis denote the absolute quality of the code:

The πŸ‘ and πŸ‘Ž indicate whether the quality has improved or gotten worse with this pull request.


Please see our documentation here for details on how these metrics are calculated.

We are actively working on this report - lots more documentation and extra metrics to come!

Help us improve this quality report!

codecov[bot] commented 1 year ago

Codecov Report

Patch coverage: 100.00% and no project coverage change.

Comparison is base (a3538c6) 100.00% compared to head (7caeb47) 100.00%.

:exclamation: Current head 7caeb47 differs from pull request most recent head def1ee8. Consider uploading reports for the commit def1ee8 to get more accurate results

Additional details and impacted files ```diff @@ Coverage Diff @@ ## master #73 +/- ## ========================================= Coverage 100.00% 100.00% ========================================= Files 26 26 Lines 498 439 -59 Branches 74 45 -29 ========================================= - Hits 498 439 -59 ``` | [Impacted Files](https://codecov.io/gh/neomatrix369/nlp_profiler/pull/73?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=None) | Coverage Ξ” | | |---|---|---| | [nlp\_profiler/constants.py](https://codecov.io/gh/neomatrix369/nlp_profiler/pull/73?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=None#diff-bmxwX3Byb2ZpbGVyL2NvbnN0YW50cy5weQ==) | `100.00% <100.00%> (ΓΈ)` | | | [nlp\_profiler/core.py](https://codecov.io/gh/neomatrix369/nlp_profiler/pull/73?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=None#diff-bmxwX3Byb2ZpbGVyL2NvcmUucHk=) | `100.00% <100.00%> (ΓΈ)` | | | [nlp\_profiler/generate\_features/\_\_init\_\_.py](https://codecov.io/gh/neomatrix369/nlp_profiler/pull/73?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=None#diff-bmxwX3Byb2ZpbGVyL2dlbmVyYXRlX2ZlYXR1cmVzL19faW5pdF9fLnB5) | `100.00% <100.00%> (ΓΈ)` | | | [...erate\_features/parallelisation\_methods/\_\_init\_\_.py](https://codecov.io/gh/neomatrix369/nlp_profiler/pull/73?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=None#diff-bmxwX3Byb2ZpbGVyL2dlbmVyYXRlX2ZlYXR1cmVzL3BhcmFsbGVsaXNhdGlvbl9tZXRob2RzL19faW5pdF9fLnB5) | `100.00% <100.00%> (ΓΈ)` | | | [nlp\_profiler/granular\_features/\_\_init\_\_.py](https://codecov.io/gh/neomatrix369/nlp_profiler/pull/73?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=None#diff-bmxwX3Byb2ZpbGVyL2dyYW51bGFyX2ZlYXR1cmVzL19faW5pdF9fLnB5) | `100.00% <100.00%> (ΓΈ)` | | | [nlp\_profiler/granular\_features/alphanumeric.py](https://codecov.io/gh/neomatrix369/nlp_profiler/pull/73?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=None#diff-bmxwX3Byb2ZpbGVyL2dyYW51bGFyX2ZlYXR1cmVzL2FscGhhbnVtZXJpYy5weQ==) | `100.00% <100.00%> (ΓΈ)` | | | [.../granular\_features/chars\_spaces\_and\_whitespaces.py](https://codecov.io/gh/neomatrix369/nlp_profiler/pull/73?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=None#diff-bmxwX3Byb2ZpbGVyL2dyYW51bGFyX2ZlYXR1cmVzL2NoYXJzX3NwYWNlc19hbmRfd2hpdGVzcGFjZXMucHk=) | `100.00% <100.00%> (ΓΈ)` | | | [nlp\_profiler/granular\_features/dates.py](https://codecov.io/gh/neomatrix369/nlp_profiler/pull/73?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=None#diff-bmxwX3Byb2ZpbGVyL2dyYW51bGFyX2ZlYXR1cmVzL2RhdGVzLnB5) | `100.00% <100.00%> (ΓΈ)` | | | [nlp\_profiler/granular\_features/emojis.py](https://codecov.io/gh/neomatrix369/nlp_profiler/pull/73?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=None#diff-bmxwX3Byb2ZpbGVyL2dyYW51bGFyX2ZlYXR1cmVzL2Vtb2ppcy5weQ==) | `100.00% <100.00%> (ΓΈ)` | | | [...ler/granular\_features/english\_non\_english\_chars.py](https://codecov.io/gh/neomatrix369/nlp_profiler/pull/73?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=None#diff-bmxwX3Byb2ZpbGVyL2dyYW51bGFyX2ZlYXR1cmVzL2VuZ2xpc2hfbm9uX2VuZ2xpc2hfY2hhcnMucHk=) | `100.00% <100.00%> (ΓΈ)` | | | ... and [11 more](https://codecov.io/gh/neomatrix369/nlp_profiler/pull/73?src=pr&el=tree-more&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=None) | | Help us with your feedback. Take ten seconds to tell us [how you rate us](https://about.codecov.io/nps?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=None). Have a feature suggestion? [Share it here.](https://app.codecov.io/gh/feedback/?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=None)

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Do you have feedback about the report comment? Let us know in this issue.

neomatrix369 commented 1 year ago

Currently blocked by Windows Unicode error which are failing the Windows runners, as per https://github.com/neomatrix369/nlp_profiler/actions/runs/4401367033/jobs/7707511237

neomatrix369 commented 1 year ago

https://github.com/neomatrix369/nlp_profiler/pull/73#issuecomment-1465494680 is fixed by 378458f, 44bcc4, 1771150

neomatrix369 commented 1 year ago

Pending: sourcery refactoring fixes to merge this PR and other checks mentioned in the body/description of the PR

neomatrix369 commented 1 year ago

https://github.com/neomatrix369/nlp_profiler/pull/73#issuecomment-1465516959 - now resolved, next manual checks of notebooks/smoke tests

neomatrix369 commented 1 year ago

Logged a regression issue #78 on the back of reviewing the notebooks