oudalab / Arabic-NER

32 stars 11 forks source link

Need to make Arabic language model and Arabic ner work under Spacy #6

Open YanLiang1102 opened 6 years ago

YanLiang1102 commented 6 years ago

Things need to do:

Train Arabic language Model

  1. we need stopwords
  2. infix, prefix and surfix
  3. may include the lemmatizer that Khaled had so far. --Khaled
  4. collect Arabic wiki articles and together with our lexisnexis arabic data using gensim to train the word vectors that needed to train the parser and tagger with Spacy, --Yan
  5. implement the necessary class and get language model trained!

Train Arabic Ner Model

using ontoNotes together with the prodigy data we have, we should be able to get like 66K records of training data, we need to writ e a customized ner model for Arabic in Spacy and get it trained.

@ahalterman @khaledJabr

YanLiang1102 commented 6 years ago

spaCy tasks

Tokenizer

For Khaled

YanLiang1102 commented 6 years ago

https://stackoverflow.com/questions/47219639/spacy-2-0-ner-training something might be useful