issues
search
ropensci
/
tokenizers
Fast, Consistent Tokenization of Natural Language Text
https://docs.ropensci.org/tokenizers
Other
185
stars
25
forks
source link
issues
Newest
Newest
Most commented
Recently updated
Oldest
Least commented
Least recently updated
Add keyboard interrupts
#37
Ironholds
closed
7 years ago
0
Word counting
#36
lmullen
closed
7 years ago
1
Fix #33
#35
Ironholds
closed
7 years ago
3
Fix #26
#34
Ironholds
closed
7 years ago
4
NA support?
#33
Ironholds
closed
7 years ago
4
`tokenize_ngram` deal with colon(:) inconsistently across different platforms
#32
everdark
closed
7 years ago
2
Add stopwords argument to `tokenize_skip_ngrams()`
#31
lmullen
closed
7 years ago
3
Add function to chunk texts into smaller segments
#30
lmullen
closed
6 years ago
3
Add Penn Treebank tokenizer
#29
jrnold
closed
7 years ago
2
Installation failed on Microsoft R Server
#28
kevinbsc
closed
7 years ago
3
Ideas for other tokenizers
#27
lmullen
closed
7 years ago
2
Remove requirement for C++11
#26
statspro1
closed
7 years ago
30
integration into quanteda as a core tokenizer
#25
kbenoit
closed
6 years ago
14
Incorrect skipgrams
#24
koheiw
closed
7 years ago
47
R-devel / Travis
#23
maelle
closed
8 years ago
2
Character level tokenizers
#22
dselivanov
closed
8 years ago
5
Using long vectors
#21
lmullen
closed
8 years ago
1
Pass argument by reference using raw pointer
#20
lmullen
closed
7 years ago
0
hunspell
#19
jeroen
closed
8 years ago
1
Error while installing tokenizers package
#18
harshakap
closed
8 years ago
2
Any tests for ngram/skip_ngram for the tail of the output?
#17
juliasilge
closed
8 years ago
8
Tokenizers does not compile on RHEL 6.5.7
#16
rsjohnso
closed
8 years ago
3
Fix errors when n exceeds number of words (Fix #14)
#15
dselivanov
closed
8 years ago
2
Bug with tokenize_ngrams when number of words in document is between n_min and n
#14
lmullen
closed
8 years ago
1
russian stopwords
#13
dselivanov
closed
8 years ago
1
Look into Penn Treebank tokenizers for English
#12
lmullen
closed
6 years ago
21
Rewrite tokenize_skip_ngrams to preserve order
#11
lmullen
closed
7 years ago
1
Comparisons to RWeka etc.
#10
soodoku
closed
8 years ago
2
stopwords
#9
dselivanov
closed
8 years ago
1
n-gram generators with stop words support
#8
dselivanov
closed
8 years ago
3
Text normalizers
#7
lmullen
closed
8 years ago
1
Piping and nesting of output
#6
lmullen
closed
8 years ago
1
Lemmatizing tokenizer
#5
lmullen
closed
1 year ago
6
Look through tokenizers in NLTK for models
#4
lmullen
closed
8 years ago
0
Add regex and split tokenizers
#3
lmullen
closed
8 years ago
0
Add tests
#2
lmullen
closed
8 years ago
1
Design discussion
#1
dselivanov
closed
8 years ago
8
Previous