junhua / IPOD

A Corpus of 475,000 Industrial Occupations
Other
63 stars 24 forks source link

Tokens & Tags don't match in length #1

Closed gracecarrillo closed 3 years ago

gracecarrillo commented 3 years ago

I'm trying to use your tokens and BIOES tags, but there are several instances where the number of tokens and the corresponding tags don't match.

i.e.

# 4 tokens, 5 tags
fx options operations analyst           | S-FUN O S-FUN S-RES O

Thus, causing trouble when trying to use it for training a NER model.

junhua commented 3 years ago

Hi @gracecarrillo, you may loop through the data and remove those that has this problem :)