issues
search
rspeer
/
wordfreq
Access a database of word frequencies, in various natural languages.
Other
1.4k
stars
101
forks
source link
issues
Newest
Newest
Most commented
Recently updated
Oldest
Least commented
Least recently updated
Update msgpack parameter
#66
rspeer
closed
5 years ago
1
update msgpack parameter
#65
rspeer
closed
5 years ago
0
Allow a wider range of 'regex' versions
#64
rspeer
closed
5 years ago
0
Regex version is incompatible with spaCy
#63
jlpeck
closed
5 years ago
2
Update my name and the Zenodo citation
#62
rspeer
closed
6 years ago
0
Argument to specify frequency source
#61
glupyan
opened
6 years ago
1
Recognize "@" in gender-neutral word endings as part of the token
#60
rspeer
closed
6 years ago
0
Korean install fixes
#59
rspeer
closed
6 years ago
0
Round wordfreq output to 3 sig. figs, and update documentation
#58
rspeer
closed
6 years ago
0
Version 2.1
#57
rspeer
closed
6 years ago
1
Handle Japanese edge cases in `simple_tokenize`
#56
rspeer
closed
6 years ago
1
Version 2, with standalone text pre-processing
#55
rspeer
closed
6 years ago
1
Fix setup.py (version number and msgpack dependency)
#54
rspeer
closed
6 years ago
0
Updated setup.py
#53
ixxie
closed
6 years ago
5
Is there a way to use custom word lists?
#52
HatScripts
closed
6 years ago
1
Version 1.7: update tokenization, update Wikipedia data, add languages
#51
rspeer
closed
7 years ago
2
Tokenize by graphemes, not codepoints
#50
rspeer
closed
7 years ago
0
Use langcodes when tokenizing again
#49
rspeer
closed
7 years ago
0
Code review notes
#48
alin-luminoso
closed
7 years ago
0
All 1.6 changes
#47
rspeer
closed
7 years ago
1
Tokenize words such as "l'heure" the same way as "l'arc"
#46
rspeer
closed
7 years ago
0
Describe how to cite wordfreq
#45
rspeer
closed
8 years ago
0
Allow MeCab to work in Japanese or Korean without the other
#44
rspeer
closed
8 years ago
0
Both Korean and Japanese dictionaries must be installed to use either
#43
alin-luminoso
closed
8 years ago
1
Look for MeCab dictionaries in various places besides this package
#42
rspeer
closed
8 years ago
5
Czech and Slovak
#41
rspeer
closed
7 years ago
1
Hungarian
#40
doublex
closed
8 years ago
3
Add Common Crawl data and more languages
#39
rspeer
closed
8 years ago
0
Tokenization in Korean, plus abjad languages
#38
rspeer
closed
8 years ago
1
Fix tokenization of SE Asian and South Asian scripts
#37
rspeer
closed
8 years ago
1
Inconsistent language-code strings lead to inconsistent normalization
#36
rspeer
closed
7 years ago
1
fix Arabic test, where 'lol' is no longer common
#35
rspeer
closed
8 years ago
0
wordfreq 1.4: some bigger wordlists, better use of language detection
#34
rspeer
closed
8 years ago
3
Restore a missing comma.
#33
alin-luminoso
closed
8 years ago
0
Leave Thai segments alone in the default regex
#32
rspeer
closed
8 years ago
1
Specify encoding when dealing with files
#31
slibs63
closed
8 years ago
0
Add English data from Reddit corpus
#30
rspeer
closed
8 years ago
1
Fix documentation and clean up, based on Sep 25 code review
#29
rspeer
closed
9 years ago
0
Add some tokenizer options
#28
rspeer
closed
9 years ago
0
Improve Chinese, Greek, English; add Turkish, Polish, Swedish
#27
rspeer
closed
9 years ago
0
Add SUBTLEX, support Turkish, expand Greek
#26
rspeer
closed
9 years ago
0
Run unit tests on Travis CI
#25
hugovk
closed
9 years ago
2
Remove the no-longer-existent .txt files from the MANIFEST.
#24
alin-luminoso
closed
9 years ago
0
Put documentation and examples in the README
#23
rspeer
closed
9 years ago
0
Use a more standard Unicode tokenizer
#22
rspeer
closed
9 years ago
0
Review notes
#21
alin-luminoso
closed
9 years ago
0
put back the freqs_to_cBpack cutoff; prepare for 1.0
#20
rspeer
closed
9 years ago
0
Code review fixes 2015 07 17
#19
Joshua-Chin
closed
9 years ago
1
Add wordfreq_builder as a sub-directory to wordfreq
#18
Joshua-Chin
closed
9 years ago
0
created alternate implementation of func-to-regex
#17
Joshua-Chin
closed
9 years ago
2
Previous
Next