neomatrix369 / nlp_profiler

A simple NLP library allows profiling datasets with one or more text columns. When given a dataset and a column name containing text data, NLP Profiler will return either high-level insights or low-level/granular statistical information about the text in that column.
Other
241 stars 37 forks source link

Add noun phrase count to the granular features functionality #47

Closed neomatrix369 closed 3 years ago

neomatrix369 commented 3 years ago

To be able to merge a pull request, there are a few checks:

Checklist

Please check the options that you have completed and strike-out the options that do not apply via this pull request:

Goal or purpose of the PR

Added Noun phrase counting. Solved the counting issue #14. Logic Updated with emoji decoding for robust noun phrase.

Changes implemented in the PR

Changes implemented by @ritikjain51 via PR #13 (see new branch https://github.com/neomatrix369/nlp_profiler/tree/addNounPhraseCount)

Thanks @ritikjain51 for your contribution and effort

codecov[bot] commented 3 years ago

Codecov Report

Merging #47 into master will not change coverage. The diff coverage is 100.00%.

Impacted file tree graph

@@            Coverage Diff            @@
##            master       #47   +/-   ##
=========================================
  Coverage   100.00%   100.00%           
=========================================
  Files           21        22    +1     
  Lines          358       380   +22     
  Branches        51        54    +3     
=========================================
+ Hits           358       380   +22     
Impacted Files Coverage Δ
nlp_profiler/constants.py 100.00% <100.00%> (ø)
nlp_profiler/granular_features/__init__.py 100.00% <100.00%> (ø)
nlp_profiler/granular_features/noun_phase_count.py 100.00% <100.00%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update 115ce83...fcd706b. Read the comment docs.

ritikjain51 commented 3 years ago

To be able to merge a pull request, there are a few checks:

Checklist

Please check the options that you have completed and strike-out the options that do not apply via this pull request:

  • [x] a clear title and description to the Pull Request has been provided
  • [x] you have read

  • [x] the pull request passes the tests (`./test-coverage "tests slow-tests"``) - this will also be visible via the Code coverage report and CI/CD task on the Pull Request
  • [x] you have performed some kind of smoke test by running your changes in an isolated environment i.e. Docker container, Google Colab, Kaggle, etc...
  • [x] the notebooks are updated (see notebooks folder, read the Notebooks docs)
  • [ ] CHANGELOG.md has been updated (please follow the existing format)

Goal or purpose of the PR

Added Noun phrase counting. Solved the counting issue #14. Logic Updated with emoji decoding for robust noun phrase.

Changes implemented in the PR

Changes implemented by @ritikjain51 via PR #13 (see new branch https://github.com/neomatrix369/nlp_profiler/tree/addNounPhraseCount)

Thanks @ritikjain51 for your contribution and effort

Thanks, @neomatrix369 for considering my contribution.

neomatrix369 commented 3 years ago

Thanks @ritikjain51 for your contribution and effort

Thanks, @neomatrix369 for considering my contribution.

You are welcome. Please also follow the feedback left on PR #13

Also if there are any code changes to this PR, feel free to add your comments to it before I go ahead and merge it

sourcery-ai[bot] commented 3 years ago

Sourcery Code Quality Report

❌  Merging this PR will decrease code quality in the affected files by 0.51%.

Quality metrics Before After Change
Complexity 0.24 ⭐ 0.24 ⭐ 0.00
Method Length 36.44 ⭐ 37.12 ⭐ 0.68 👎
Working memory 7.08 🙂 7.32 🙂 0.24 👎
Quality 86.20% 85.69% -0.51% 👎
Other metrics Before After Change
Lines 400 407 7
Changed files Quality Before Quality After Quality Change
nlp_profiler/constants.py 85.66% ⭐ 85.26% ⭐ -0.40% 👎
nlp_profiler/granular_features/init.py 74.39% 🙂 68.19% 🙂 -6.20% 👎
slow-tests/performance_tests/test_perf_grammar_check.py 98.42% ⭐ 98.42% ⭐ 0.00%
slow-tests/performance_tests/test_perf_spelling_check.py 97.11% ⭐ 97.11% ⭐ 0.00%
tests/acceptance_tests/test_apply_text_profiling.py 85.26% ⭐ 85.26% ⭐ 0.00%
tests/granular/test_duplicates.py 92.53% ⭐ 92.53% ⭐ 0.00%
tests/high_level/test_sentiment_polarity.py 79.63% ⭐ 79.63% ⭐ 0.00%
tests/high_level/test_sentiment_subjectivity.py 79.63% ⭐ 79.63% ⭐ 0.00%
tests/high_level/test_spelling_check.py 75.75% ⭐ 75.75% ⭐ 0.00%

Here are some functions in these files that still need a tune-up:

File Function Complexity Length Working Memory Quality Recommendation
nlp_profiler/granular_features/init.py apply_granular_features 0 74 🙂 33 ⛔ 56.89% 🙂 Extract out complex expressions
tests/high_level/test_spelling_check.py test_given_a_text_when_spell_check_is_applied_then_spell_check_analysis_info_is_returned 2 ⭐ 68 🙂 11 😞 69.77% 🙂 Extract out complex expressions
tests/high_level/test_sentiment_polarity.py test_given_a_text_when_sentiment_analysis_is_applied_then_sentiment_analysis_info_is_returned 2 ⭐ 59 ⭐ 10 😞 73.06% 🙂 Extract out complex expressions
tests/high_level/test_sentiment_subjectivity.py test_given_a_text_when_sentiment_subjectivity_analysis_is_applied_then_subjective_analysis_info_is_returned 2 ⭐ 59 ⭐ 10 😞 73.06% 🙂 Extract out complex expressions

Legend and Explanation

The emojis denote the absolute quality of the code:

The 👍 and 👎 indicate whether the quality has improved or gotten worse with this pull request.


Please see our documentation here for details on how these metrics are calculated.

We are actively working on this report - lots more documentation and extra metrics to come!

Let us know what you think of it by mentioning @sourcery-ai in a comment.

neomatrix369 commented 3 years ago

@sourcery-ai on this PR the Checks indicated a green check for ✅ Sourcery, which means PR can be merged, but then this comment gives a different idea: ❌ Merging this PR will decrease code quality in the affected files by 0.51%.

Would help if explanations for both would be clearer - which one to follow and why. Maybe both are right but the context isn't clear.