salesforce / TransmogrifAI

TransmogrifAI (pronounced trăns-mŏgˈrə-fī) is an AutoML library for building modular, reusable, strongly typed machine learning workflows on Apache Spark with minimal hand-tuning
https://transmogrif.ai
BSD 3-Clause "New" or "Revised" License
2.24k stars 392 forks source link

Added Chinese and Korean examples to TextTokenizerTest #442

Closed Jauntbox closed 4 years ago

Jauntbox commented 4 years ago

Related issues n/a

Describe the proposed solution n/a

Describe alternatives you've considered n/a

Additional context This is a small change to better allow testing of alternatives to the CJK tokenizer (that we've already replaced for Japanese). The CJK tokenizer uses bigrams for its tokenization, rather than trying to extract words, so most of the tokens from a text sample will have length 2 (not all, since other languages can be mixed in). Some of the simpler ID detection calculations will look at the distributions of token lengths, so they may incorrectly think that text from languages using the CJK tokenizer is IDs.

codecov[bot] commented 4 years ago

Codecov Report

Merging #442 into master will not change coverage. The diff coverage is n/a.

Impacted file tree graph

@@           Coverage Diff           @@
##           master     #442   +/-   ##
=======================================
  Coverage   86.93%   86.93%           
=======================================
  Files         337      337           
  Lines       11096    11096           
  Branches      362      362           
=======================================
  Hits         9646     9646           
  Misses       1450     1450

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update e45073d...9f67367. Read the comment docs.

tovbinm commented 4 years ago

LOCO test is failing @sanmitra @Jauntbox

sanmitra commented 4 years ago

@tovbinm The LOCO test - com.salesforce.op.stages.impl.insights.RecordInsightsLOCOTest is succeeding. Where exactly you are seeing the failure of LOCO test ?

tovbinm commented 4 years ago

It’s a flaky one. See previous runs.