skywind3000 / lemma.en

English Lemma Database - Compiled by Referencing British National Corpus
MIT License
29 stars 3 forks source link

Preface

English Lemma Database - Compiled by Referencing British National Corpus

Compiled by Lin Wei (https://github.com/skywind3000), Mar 28, 2017 by referencing the 100M+ words in the British Nation Corpus (BNC), NodeBox Linguistics and Yasumasa Someya's lemma list.

This lemma list is provided "as is" and is free to use for any research and/or educational purposes. The list currently contains 186,523 words (tokens) in 84,487 lemma groups.

Data Format

Definition

word/bnc-frequence -> form1 (, form2 (, form3...))

Data Sample:

be/4109826 -> is,was,are,were,'s,been,being,'re,'m,am,m
have/1315648 -> had,has,'ve,having,'s,'d,d,ve
it/1213224 -> its,they
he/1196022 -> his,him,they
i/1133697 -> my,me,we,is
they/841960 -> their,them,'em
you/804279 -> your,ya,ye
not/767330 -> n't
she/653505 -> her
do/535646 -> did,does,done,doing,du,d'
we/503360 -> our,us
will/334612 -> 'll,wo,ll
say/317317 -> said,says,saying
would/278414 -> 'd
can/263138 -> ca,cans,can,could
go/227247 -> going,went,gone,goes,goin'
get/212569 -> got,getting,gets,gotten
make/209818 -> made,making,makes
up/206976 -> ups,upping,upped
see/184969 -> seen,saw,seeing,sees
other/181277 -> others
time/181080 -> times,timed,timing
know/177717 -> knew,known,knows,knowing
take/172773 -> took,taken,taking,takes
year/161649 -> years

About

If you have any questions or comments about this lemma list, feel free to contact me (skywind3000@163.com), at any time...