issues
search
gucorpling
/
amalgum
English web corpus with 4M tokens and several annotation types
25
stars
6
forks
source link
issues
Newest
Newest
Most commented
Recently updated
Oldest
Least commented
Least recently updated
Bump transformers from 3.5.1 to 4.30.0
#21
dependabot[bot]
opened
1 year ago
0
Update tokenizers
#20
yilunzhu
closed
2 years ago
0
Add seq2set entity recognizer
#19
yilunzhu
closed
2 years ago
0
Xrenner Module: 'BasicTokenizer' object has no attribute 'strip_accents'
#18
nitinvwaran
opened
2 years ago
3
Xml annotations
#17
lauren-lizzy-levine
closed
3 years ago
0
Xml annotations
#16
lauren-lizzy-levine
closed
3 years ago
0
Fix for issue #14
#15
nitinvwaran
closed
3 years ago
0
Date-time module flips tokens in rare case in the XML format
#14
amir-zeldes
closed
3 years ago
4
Space annotations
#13
lauren-lizzy-levine
closed
3 years ago
0
Dev
#12
amir-zeldes
closed
3 years ago
0
Dev 0.2 data rc
#11
amir-zeldes
closed
3 years ago
0
Added Date/Time module and pipeline
#10
nitinvwaran
closed
4 years ago
1
STyper module not working in dev
#9
lgessler
closed
3 years ago
26
Why did we freeze these packages?
#8
lgessler
opened
4 years ago
2
Be more cautious with model-related artifacts
#7
lgessler
opened
4 years ago
0
sync
#6
lgessler
closed
4 years ago
0
some fixes and freezing libs
#5
nitinvwaran
closed
4 years ago
0
Need process to eliminate bad spaces
#4
amir-zeldes
opened
4 years ago
0
Upgrade StanfordNLP to Stanza
#3
amir-zeldes
closed
3 years ago
1
Figures
#2
amir-zeldes
closed
4 years ago
1
Add proper date/time detection
#1
amir-zeldes
closed
3 years ago
1