ropensci / tokenizers

Fast, Consistent Tokenization of Natural Language Text
https://docs.ropensci.org/tokenizers
Other
184 stars 25 forks source link

Fix #33 #35

Closed Ironholds closed 7 years ago

Ironholds commented 7 years ago
library(tokenizers)
test <- c("This is a text", NA, "So is this")
names(test) <- letters[1:3]
stops <- c("a", NA)
# Input with NA
tokenize_words(test)
#> $a
#> [1] "this" "is"   "a"    "text"
#> 
#> $b
#> [1] NA
#> 
#> $c
#> [1] "so"   "is"   "this"
# Input with NA
tokenize_ngrams(test, n = 2)
#> $a
#> [1] "this is" "is a"    "a text" 
#> 
#> $b
#> [1] NA
#> 
#> $c
#> [1] "so is"   "is this"
# Input with NA, stopwords with NA
tokenize_words(test, stopwords = stops)
#> $a
#> [1] "this" "is"   "text"
#> 
#> $b
#> [1] NA
#> 
#> $c
#> [1] "so"   "is"   "this"
# Input with NA, stopwords with NA
tokenize_ngrams(test, n = 2, stopwords = stops)
#> $a
#> [1] "this is" "is text"
#> 
#> $b
#> [1] NA
#> 
#> $c
#> [1] "so is"   "is this"
# Skip ngrams
tokenizers::tokenize_skip_ngrams(test)
# $a
# [1] "this is a" "is a text"
# 
# $b
# [1] NA
# 
# $c
# [1] "so is this"
codecov-io commented 7 years ago

Codecov Report

Merging #35 into master will increase coverage by 0.34%. The diff coverage is 87.5%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master      #35      +/-   ##
==========================================
+ Coverage   87.35%   87.69%   +0.34%     
==========================================
  Files          11       11              
  Lines         253      260       +7     
==========================================
+ Hits          221      228       +7     
  Misses         32       32
Impacted Files Coverage Δ
src/shingle_ngrams.cpp 100% <100%> (+2.12%) :arrow_up:
src/skip_ngrams.cpp 100% <100%> (ø) :arrow_up:
R/utils.R 93.75% <75%> (-6.25%) :arrow_down:

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update 6569edf...16cbcc4. Read the comment docs.

Ironholds commented 7 years ago

tina voice uggggghhhhhh

(okay gonna write some dang tests)

lmullen commented 7 years ago

Thanks, @Ironholds!