bitanb1999 commented 1 year ago

Please check the options that you have completed and strike-out the options that do not apply via this pull request:

[X] a clear title and description to the Pull Request has been provided you have read
[X] the Contributing doc
[X] the Developer Guide
[X] the pull request passes the tests (./test-coverage "tests slow-tests") - this will also be visible via the Code coverage report and CI/CD task on the Pull Request
[X] you have performed some kind of smoke test by running your changes in an isolated environment i.e. Docker container, Google Colab, Kaggle, etc...
~~[ ] the notebooks are updated (see notebooks folder, read the Notebooks docs)~~
[X] CHANGELOG.md has been updated (please follow the existing format)

Goal or purpose of the PR

The grammar check function previously used the python language tool, which took significant time to process each text in the textual dataframe and return the output.

Changes implemented in the PR

I analyzed the alternatives available in NLP and came across two options: 1. Happy transformers with the hyperparameter tuning of Gramformer( check: https://github.com/PrithivirajDamodaran/Gramformer) and Gingerit package (check: https://github.com/Azd325/gingerit). Gingerit had a throughput time of 34.8 seconds whereas the language tool from python took 41secs to process each text. This seemed to be a huge upgrade.
Transformers are also a great alternative and did equivalently well but given the constraint of accessing Huggingface every time a text needs to be checked, seems like unnecessary overhead. I have made the changes to the requirement file and to the grammar check python file.

sourcery-ai[bot] commented 1 year ago

Sourcery Code Quality Report

✅ Merging this PR will increase code quality in the affected files by 1.06%.

Quality metrics	Before	After	Change
Complexity	1.94 ⭐	1.94 ⭐	0.00
Method Length	35.25 ⭐	34.50 ⭐	-0.75 👍
Working memory	4.88 ⭐	4.56 ⭐	-0.32 👍
Quality	89.10% ⭐	90.16% ⭐	1.06% 👍

Other metrics	Before	After	Change
Lines	37	39	2

Changed files	Quality Before	Quality After	Quality Change
nlp_profiler/high_level_features/grammar_quality_check.py	89.10% ⭐	90.16% ⭐	1.06% 👍

Here are some functions in these files that still need a tune-up:

File	Function	Complexity	Length	Working Memory	Quality	Recommendation

Legend and Explanation

The emojis denote the absolute quality of the code:

⭐ excellent
🙂 good
😞 poor
⛔ very poor

The 👍 and 👎 indicate whether the quality has improved or gotten worse with this pull request.

Please see our documentation here for details on how these metrics are calculated.

We are actively working on this report - lots more documentation and extra metrics to come!

Help us improve this quality report!