Closed michaelweilsalesforce closed 4 years ago
This PR doesn't introduce options yet
Merging #478 into master will increase coverage by
0.00%
. The diff coverage is100.00%
.
@@ Coverage Diff @@
## master #478 +/- ##
========================================
Coverage 87.00% 87.01%
========================================
Files 345 345
Lines 11673 11680 +7
Branches 388 613 +225
========================================
+ Hits 10156 10163 +7
Misses 1517 1517
Impacted Files | Coverage Δ | |
---|---|---|
...n/scala/com/salesforce/op/dsl/RichMapFeature.scala | 67.64% <ø> (ø) |
|
.../scala/com/salesforce/op/dsl/RichTextFeature.scala | 82.19% <100.00%> (+0.24%) |
:arrow_up: |
...p/stages/impl/feature/SmartTextMapVectorizer.scala | 100.00% <100.00%> (ø) |
|
...e/op/stages/impl/feature/SmartTextVectorizer.scala | 95.20% <100.00%> (+0.03%) |
:arrow_up: |
...esforce/op/stages/impl/feature/TextTokenizer.scala | 97.36% <100.00%> (+0.14%) |
:arrow_up: |
...sforce/op/stages/impl/feature/Transmogrifier.scala | 98.05% <100.00%> (ø) |
Continue to review full report at Codecov.
Legend - Click here to learn more
Δ = absolute <relative> (impact)
,ø = not affected
,? = missing data
Powered by Codecov. Last update eba38a0...97b9ce8. Read the comment docs.
@leahmcguire @Jauntbox could you take a look at this PR ? I'm not sure how to test my changes :(
Let us actually fill out the form for the PR description to set the context :)
Related issues
When engineering features from a
Text
(andText
-like) raw features, we should strip the text of any html tags, which doesn't add signal to existing tokens (and even pollutes them).Describe the proposed solution
Enable html stripping via
TextTokenizer.AnalyzerHtmlStrip