hygia-org / hygia

Library to automate the execution and analysis of ML models
https://hygia-org.github.io/hygia/
MIT License
4 stars 1 forks source link

65/normalize key smash features #72

Closed apmt closed 1 year ago

apmt commented 1 year ago

Motivation

  1. Key Smash Features Normalization
  2. Key Smash Sequence Feature Threshold from 2+ to 3+
  3. Abbreviations followed by a dot
  4. Context invalid words
  5. Removing key_smash alphanumeric feature that was causin conflits
  6. Added a model option to validade Word Embedding features inclusion

Changes

Status Checklist

codecov[bot] commented 1 year ago

Codecov Report

Base: 79.70% // Head: 79.89% // Increases project coverage by +0.18% :tada:

Coverage data is based on head (975cc18) compared to base (333e4af). Patch coverage: 63.15% of modified lines in pull request are covered.

:exclamation: Current head 975cc18 differs from pull request most recent head 99faaaa. Consider uploading reports for the commit 99faaaa to get more accurate results

Additional details and impacted files ```diff @@ Coverage Diff @@ ## main #72 +/- ## ========================================== + Coverage 79.70% 79.89% +0.18% ========================================== Files 28 31 +3 Lines 892 965 +73 ========================================== + Hits 711 771 +60 - Misses 181 194 +13 ``` | [Impacted Files](https://codecov.io/gh/hygia-org/hygia/pull/72?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=hygia-org) | Coverage Δ | | |---|---|---| | [hygia/data\_pipeline/model/random\_forest.py](https://codecov.io/gh/hygia-org/hygia/pull/72?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=hygia-org#diff-aHlnaWEvZGF0YV9waXBlbGluZS9tb2RlbC9yYW5kb21fZm9yZXN0LnB5) | `34.17% <26.31%> (+7.64%)` | :arrow_up: | | [hygia/data\_pipeline/feature\_engineering/regex.py](https://codecov.io/gh/hygia-org/hygia/pull/72?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=hygia-org#diff-aHlnaWEvZGF0YV9waXBlbGluZS9mZWF0dXJlX2VuZ2luZWVyaW5nL3JlZ2V4LnB5) | `83.60% <44.44%> (-16.40%)` | :arrow_down: | | [...data\_pipeline/pre\_process\_data/pre\_process\_data.py](https://codecov.io/gh/hygia-org/hygia/pull/72?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=hygia-org#diff-aHlnaWEvZGF0YV9waXBlbGluZS9wcmVfcHJvY2Vzc19kYXRhL3ByZV9wcm9jZXNzX2RhdGEucHk=) | `59.09% <75.00%> (+1.51%)` | :arrow_up: | | [hygia/\_\_init\_\_.py](https://codecov.io/gh/hygia-org/hygia/pull/72?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=hygia-org#diff-aHlnaWEvX19pbml0X18ucHk=) | `100.00% <100.00%> (ø)` | | | [hygia/data\_pipeline/augment\_data/augment\_data.py](https://codecov.io/gh/hygia-org/hygia/pull/72?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=hygia-org#diff-aHlnaWEvZGF0YV9waXBlbGluZS9hdWdtZW50X2RhdGEvYXVnbWVudF9kYXRhLnB5) | `96.15% <100.00%> (ø)` | | | [...ipeline/feature\_engineering/feature\_engineering.py](https://codecov.io/gh/hygia-org/hygia/pull/72?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=hygia-org#diff-aHlnaWEvZGF0YV9waXBlbGluZS9mZWF0dXJlX2VuZ2luZWVyaW5nL2ZlYXR1cmVfZW5naW5lZXJpbmcucHk=) | `100.00% <100.00%> (ø)` | | | [...gia/data\_pipeline/feature\_engineering/key\_smash.py](https://codecov.io/gh/hygia-org/hygia/pull/72?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=hygia-org#diff-aHlnaWEvZGF0YV9waXBlbGluZS9mZWF0dXJlX2VuZ2luZWVyaW5nL2tleV9zbWFzaC5weQ==) | `94.33% <100.00%> (+4.86%)` | :arrow_up: | | [hygia/paths/paths.py](https://codecov.io/gh/hygia-org/hygia/pull/72?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=hygia-org#diff-aHlnaWEvcGF0aHMvcGF0aHMucHk=) | `100.00% <100.00%> (ø)` | | | [.../data\_pipeline/annotate\_data/test\_annotate\_data.py](https://codecov.io/gh/hygia-org/hygia/pull/72?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=hygia-org#diff-dGVzdHMvZGF0YV9waXBlbGluZS9hbm5vdGF0ZV9kYXRhL3Rlc3RfYW5ub3RhdGVfZGF0YS5weQ==) | `100.00% <100.00%> (ø)` | | | [...ts/data\_pipeline/augment\_data/test\_augment\_data.py](https://codecov.io/gh/hygia-org/hygia/pull/72?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=hygia-org#diff-dGVzdHMvZGF0YV9waXBlbGluZS9hdWdtZW50X2RhdGEvdGVzdF9hdWdtZW50X2RhdGEucHk=) | `100.00% <100.00%> (ø)` | | | ... and [5 more](https://codecov.io/gh/hygia-org/hygia/pull/72?src=pr&el=tree-more&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=hygia-org) | | Help us with your feedback. Take ten seconds to tell us [how you rate us](https://about.codecov.io/nps?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=hygia-org). Have a feature suggestion? [Share it here.](https://app.codecov.io/gh/feedback/?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=hygia-org)

:umbrella: View full report at Codecov.
:loudspeaker: Do you have feedback about the report comment? Let us know in this issue.