salesforce / TransmogrifAI

TransmogrifAI (pronounced trăns-mŏgˈrə-fī) is an AutoML library for building modular, reusable, strongly typed machine learning workflows on Apache Spark with minimal hand-tuning
https://transmogrif.ai
BSD 3-Clause "New" or "Revised" License
2.24k stars 392 forks source link

Incorporate name detection into SmartTextVectorizer #508

Closed Jauntbox closed 3 years ago

Jauntbox commented 4 years ago

Related issues Re-opening of https://github.com/salesforce/TransmogrifAI/pull/456 on branch directly on TransmogrifAI, please see that PR for historical discussion (I'll close once this one gets merged in).

Describe the proposed solution Adds parameters for ignoring fields identified as personal names to SmartTextVectorizer

Describe alternatives you've considered N/A

Additional context N/A

Jauntbox commented 4 years ago

Right now, this is just compiling and passing tests so far. I still have to give it a pass to clean things up before it's ready for a real review

codecov[bot] commented 4 years ago

Codecov Report

Merging #508 into master will increase coverage by 7.57%. The diff coverage is 94.11%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #508      +/-   ##
==========================================
+ Coverage   79.15%   86.73%   +7.57%     
==========================================
  Files         347      347              
  Lines       11851    11897      +46     
  Branches      384      602     +218     
==========================================
+ Hits         9381    10319     +938     
+ Misses       2470     1578     -892     
Impacted Files Coverage Δ
.../scala/com/salesforce/op/dsl/RichTextFeature.scala 82.19% <ø> (+12.32%) :arrow_up:
...main/scala/com/salesforce/op/test/TestCommon.scala 34.61% <0.00%> (-6.30%) :arrow_down:
...s/impl/feature/OPCollectionHashingVectorizer.scala 96.55% <100.00%> (+15.43%) :arrow_up:
...p/stages/impl/feature/SmartTextMapVectorizer.scala 100.00% <100.00%> (+2.12%) :arrow_up:
...e/op/stages/impl/feature/SmartTextVectorizer.scala 95.33% <100.00%> (+0.12%) :arrow_up:
...m/salesforce/op/utils/stages/NameDetectUtils.scala 89.44% <100.00%> (+89.44%) :arrow_up:
.../src/main/scala/com/salesforce/op/OpWorkflow.scala 88.19% <0.00%> (+0.69%) :arrow_up:
...rce/op/stages/impl/preparators/SanityChecker.scala 90.57% <0.00%> (+1.22%) :arrow_up:
...ala/com/salesforce/op/features/types/package.scala 58.21% <0.00%> (+1.36%) :arrow_up:
...orce/op/aggregators/MonoidAggregatorDefaults.scala 100.00% <0.00%> (+1.78%) :arrow_up:
... and 59 more

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update 87eac31...d1ec7f1. Read the comment docs.