ropensci tokenizers issues

ropensci / tokenizers

Fast, Consistent Tokenization of Natural Language Text

https://docs.ropensci.org/tokenizers

Other

185 stars 25 forks source link

issues

Newest

Newest Most commented Recently updated Oldest Least commented Least recently updated

implicit conversion of character input to UTF-8

#87 ablaette opened 9 months ago
0
New Maintainer Welcome :-)

#86 maelle closed 8 months ago
3
Add strip_url option to tokenize_words()

#85 fschaffner closed 1 year ago
1
tokenize_tweets replacement

#84 alanault closed 1 year ago
2
Remove twitter

#83 lmullen closed 1 year ago
0
Twitter tokenizing logic broken by upcoming ICU 72 breaking change ('@' no longer splits)

#82 MichaelChirico closed 1 year ago
14
Possible CRAN release

#81 EmilHvitfeldt closed 2 years ago
2
keeping punctuation

#80 Legallois closed 1 year ago
0
Add strip_numeric = TRUE option to tokenize_tweets()

#79 fschaffner closed 1 year ago
1
Split into words in tokenize_tweets even when strip_punct is set to TRUE

#78 hideaki closed 3 years ago
2
Fix 76

#77 kbenoit closed 4 years ago
0
Inconsistent behavior of tokenize_tweets() when filtering stopwords with punctuation

#76 syumet closed 4 years ago
2
Use official docs URL in description

#75 jeroen closed 4 years ago
2
Output ngram might consider punctuation separation?

#74 hope-data-science closed 4 years ago
11
Why could tokenizers split Chinese as well?

#73 hope-data-science closed 5 years ago
1
Clarify what `n_min` means for n-gram tokenization.

#72 juliasilge closed 4 years ago
1
Clarification of argument documentation for x

#71 EmilHvitfeldt closed 6 years ago
3
tokenize_tweets and single word strings

#70 juliasilge closed 6 years ago
2
Add GitHub link

#69 maelle closed 6 years ago
1
tokenize_tweets doesn't separate emojis with no spaces between them

#68 EmilHvitfeldt closed 3 years ago
4
remove unused var in func 'generate_ngrams_batch()'

#67 ChrisMuir closed 6 years ago
2
Joss paper

#66 lmullen closed 6 years ago
1
Add description of tokens

#65 kbenoit closed 6 years ago
1
Add description of TIF formatted data.frames as inputs

#64 kbenoit closed 6 years ago
2
Comply with TIF requirements

#63 lmullen closed 6 years ago
1
Update DESCRIPTION prior to release

#62 lmullen closed 6 years ago
0
Add a pkgdown website

#61 lmullen closed 6 years ago
0
Fix encoding issues on Windows

#60 lmullen closed 6 years ago
1
Tokenize sentences starting with a number

#59 ekstroem closed 6 years ago
2
Specify encoding in C++ code for skip_ngrams

#58 patperry closed 6 years ago
12
Strip punctuation option for tokenize_ngrams

#57 alanault closed 6 years ago
0
Added rOpenSci review badge

#56 karthik closed 7 years ago
1
Installation Error

#55 rlumor closed 7 years ago
3
Error: could not find function "%>%"

#54 fahadshery closed 7 years ago
1
Add Jockers stopwords

#53 lmullen closed 6 years ago
0
Inconsistent tokenizing when numbers are followed by a period

#52 rajkorde closed 7 years ago
1
Low-level parallelism with RcppParallel

#51 lmullen closed 1 year ago
3
Lower level C++ api with external pointers

#50 dselivanov closed 7 years ago
4
Comply with text interchange format, perhaps also adding vignette

#49 lmullen closed 6 years ago
7
Punctuation options

#48 lmullen closed 7 years ago
2
can we use alternative lexicons?

#47 randomgambit closed 7 years ago
4
Depend on the stopwords package instead of providing that functionality

#46 randomgambit closed 7 years ago
3
Way of committing to repo

#45 dselivanov closed 7 years ago
2
Add tokenize_tweets() function and tests

#44 kbenoit closed 7 years ago
2
Update README and vignettes for new release

#43 lmullen closed 6 years ago
0
Deprecate tokenize_regex()

#42 lmullen closed 7 years ago
0
Trouble installing package "tokenizers" on R v3.2.3

#41 sanjesk1 closed 7 years ago
7
Fix #37 and update NEWS

#40 Ironholds closed 7 years ago
2
Submit paper to JOSS

#39 lmullen closed 6 years ago
14
Fix #31

#38 Ironholds closed 7 years ago
5