salesforce / TransmogrifAI

TransmogrifAI (pronounced trăns-mŏgˈrə-fī) is an AutoML library for building modular, reusable, strongly typed machine learning workflows on Apache Spark with minimal hand-tuning
https://transmogrif.ai
BSD 3-Clause "New" or "Revised" License
2.24k stars 392 forks source link

Incorporate name detection into SmartTextVectorizer #456

Closed MWYang closed 3 years ago

MWYang commented 4 years ago

Describe the proposed solution Incorporates the changes in #445 and #457 into SmartTextVectorizer and SmartTextMapVectorizer.

Additional context Merge #457 before merging this PR. Compare the diff between this PR and that one on my forked repo.

Changes from #455 needs to be merged before this PR is ready.

codecov[bot] commented 4 years ago

Codecov Report

Merging #456 into master will decrease coverage by 12.27%. The diff coverage is 84.05%.

Impacted file tree graph

@@             Coverage Diff             @@
##           master     #456       +/-   ##
===========================================
- Coverage      87%   74.72%   -12.28%     
===========================================
  Files         341      341               
  Lines       11485    11532       +47     
  Branches      378      597      +219     
===========================================
- Hits         9992     8617     -1375     
- Misses       1493     2915     +1422
Impacted Files Coverage Δ
.../scala/com/salesforce/op/dsl/RichTextFeature.scala 72.22% <ø> (-9.73%) :arrow_down:
...main/scala/com/salesforce/op/test/TestCommon.scala 40.9% <0%> (-9.1%) :arrow_down:
...e/op/stages/impl/feature/SmartTextVectorizer.scala 95.79% <100%> (+0.18%) :arrow_up:
...m/salesforce/op/utils/stages/NameDetectUtils.scala 86.11% <100%> (-1.94%) :arrow_down:
...s/impl/feature/OPCollectionHashingVectorizer.scala 93.87% <66.66%> (-2.68%) :arrow_down:
...p/stages/impl/feature/SmartTextMapVectorizer.scala 93.33% <73.91%> (-6.67%) :arrow_down:
...sforce/op/stages/base/binary/BinaryEstimator.scala 0% <0%> (-100%) :arrow_down:
...la/com/salesforce/op/aggregators/Geolocation.scala 0% <0%> (-100%) :arrow_down:
.../salesforce/op/aggregators/FeatureAggregator.scala 0% <0%> (-100%) :arrow_down:
...stages/base/sequence/BinarySequenceEstimator.scala 0% <0%> (-100%) :arrow_down:
... and 98 more

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update 8c0f67b...9344e5f. Read the comment docs.