Closed Ironholds closed 7 years ago
Merging #38 into master will increase coverage by
0.33%
. The diff coverage is96.22%
.
@@ Coverage Diff @@
## master #38 +/- ##
==========================================
+ Coverage 87.69% 88.02% +0.33%
==========================================
Files 11 11
Lines 260 284 +24
==========================================
+ Hits 228 250 +22
- Misses 32 34 +2
Impacted Files | Coverage Δ | |
---|---|---|
src/shingle_ngrams.cpp | 100% <ø> (ø) |
:arrow_up: |
src/init.c | 100% <ø> (ø) |
:arrow_up: |
R/ngram-tokenizers.R | 96.55% <92.85%> (-3.45%) |
:arrow_down: |
src/skip_ngrams.cpp | 97.43% <97.43%> (-2.57%) |
:arrow_down: |
Continue to review full report at Codecov.
Legend - Click here to learn more
Δ = absolute <relative> (impact)
,ø = not affected
,? = missing data
Powered by Codecov. Last update b6fc617...38802e9. Read the comment docs.
@Ironholds This looks great. I'm not merging it immediately, simply because I am working on getting tokenize_skip_ngrams()
to produce correct output. I'm at the place where you could take over that if you wanted, and combine the solutions to #31 and #24. Let me pick up the discussion in #24.
OK, this should be good to go unless you want me to add in the keyboard-interrupts for convenience and clear out 3 open tickets at once (which would also give me an opportunity to update news, which I forgot to do, which..doh)
Just saw that last message, sorry. I've merged this in. Looks fantastic. I'm going to go through and add some more tests and rework some of the documentation/vignettes in light of this, but it looks like you've completed the bulk of the work. Thanks so much!!!
Yay!
This fixes #31 . It's a bit more complicated than one might think from the description, essentially because the most efficient way to check stopwords is with a set object, which needs to be constructed within C++. Accordingly I've shifted the vectorisation to compiled code, so that the set object only needs to be constructed once.