issues
search
ropensci
/
tokenizers
Fast, Consistent Tokenization of Natural Language Text
https://docs.ropensci.org/tokenizers
Other
185
stars
25
forks
source link
issues
Newest
Newest
Most commented
Recently updated
Oldest
Least commented
Least recently updated
implicit conversion of character input to UTF-8
#87
ablaette
opened
9 months ago
0
New Maintainer Welcome :-)
#86
maelle
closed
8 months ago
3
Add strip_url option to tokenize_words()
#85
fschaffner
closed
1 year ago
1
tokenize_tweets replacement
#84
alanault
closed
1 year ago
2
Remove twitter
#83
lmullen
closed
1 year ago
0
Twitter tokenizing logic broken by upcoming ICU 72 breaking change ('@' no longer splits)
#82
MichaelChirico
closed
1 year ago
14
Possible CRAN release
#81
EmilHvitfeldt
closed
2 years ago
2
keeping punctuation
#80
Legallois
closed
1 year ago
0
Add strip_numeric = TRUE option to tokenize_tweets()
#79
fschaffner
closed
1 year ago
1
Split into words in tokenize_tweets even when strip_punct is set to TRUE
#78
hideaki
closed
3 years ago
2
Fix 76
#77
kbenoit
closed
4 years ago
0
Inconsistent behavior of tokenize_tweets() when filtering stopwords with punctuation
#76
syumet
closed
4 years ago
2
Use official docs URL in description
#75
jeroen
closed
4 years ago
2
Output ngram might consider punctuation separation?
#74
hope-data-science
closed
4 years ago
11
Why could tokenizers split Chinese as well?
#73
hope-data-science
closed
5 years ago
1
Clarify what `n_min` means for n-gram tokenization.
#72
juliasilge
closed
4 years ago
1
Clarification of argument documentation for x
#71
EmilHvitfeldt
closed
6 years ago
3
tokenize_tweets and single word strings
#70
juliasilge
closed
6 years ago
2
Add GitHub link
#69
maelle
closed
6 years ago
1
tokenize_tweets doesn't separate emojis with no spaces between them
#68
EmilHvitfeldt
closed
3 years ago
4
remove unused var in func 'generate_ngrams_batch()'
#67
ChrisMuir
closed
6 years ago
2
Joss paper
#66
lmullen
closed
6 years ago
1
Add description of tokens
#65
kbenoit
closed
6 years ago
1
Add description of TIF formatted data.frames as inputs
#64
kbenoit
closed
6 years ago
2
Comply with TIF requirements
#63
lmullen
closed
6 years ago
1
Update DESCRIPTION prior to release
#62
lmullen
closed
6 years ago
0
Add a pkgdown website
#61
lmullen
closed
6 years ago
0
Fix encoding issues on Windows
#60
lmullen
closed
6 years ago
1
Tokenize sentences starting with a number
#59
ekstroem
closed
6 years ago
2
Specify encoding in C++ code for skip_ngrams
#58
patperry
closed
6 years ago
12
Strip punctuation option for tokenize_ngrams
#57
alanault
closed
6 years ago
0
Added rOpenSci review badge
#56
karthik
closed
7 years ago
1
Installation Error
#55
rlumor
closed
7 years ago
3
Error: could not find function "%>%"
#54
fahadshery
closed
7 years ago
1
Add Jockers stopwords
#53
lmullen
closed
6 years ago
0
Inconsistent tokenizing when numbers are followed by a period
#52
rajkorde
closed
7 years ago
1
Low-level parallelism with RcppParallel
#51
lmullen
closed
1 year ago
3
Lower level C++ api with external pointers
#50
dselivanov
closed
7 years ago
4
Comply with text interchange format, perhaps also adding vignette
#49
lmullen
closed
6 years ago
7
Punctuation options
#48
lmullen
closed
7 years ago
2
can we use alternative lexicons?
#47
randomgambit
closed
7 years ago
4
Depend on the stopwords package instead of providing that functionality
#46
randomgambit
closed
7 years ago
3
Way of committing to repo
#45
dselivanov
closed
7 years ago
2
Add tokenize_tweets() function and tests
#44
kbenoit
closed
7 years ago
2
Update README and vignettes for new release
#43
lmullen
closed
6 years ago
0
Deprecate tokenize_regex()
#42
lmullen
closed
7 years ago
0
Trouble installing package "tokenizers" on R v3.2.3
#41
sanjesk1
closed
7 years ago
7
Fix #37 and update NEWS
#40
Ironholds
closed
7 years ago
2
Submit paper to JOSS
#39
lmullen
closed
6 years ago
14
Fix #31
#38
Ironholds
closed
7 years ago
5
Next