Hello,
I am trying to perform an NER experiment on a custom dataset containing a lot of food items.
I have labels for certain unigrams and bigrams for my training data.
My label corpus contains "green chilli" = "vegetable". I don't have "chilli" as a label
I am using this label list in order to annotate sentences for NER.
For example:
A sentence might contain a bigram such as "green chilli" with it's associated label = "vegetable"
Currently while generating the features, I am marking both "green" and "chilli" as "vegetable".
My annotation pipeline is as follows:
Split sentence into unigrams
Check if unigram exists in label list -> If label exists mark unigram with label
Get bigram by considering token + sentence[idx+1] or token + sentence[idx-1]
Check if bigram exists in label corpus -->> mark both token and sentence[idx+1] or sentence[idx-1] with that label
As a result of point number 4, both green and chilli get marked as vegetable
So when I train my model and run inference on a test sentence containing "green chilli", I would get "vegetable", "vegetable" twice.
What would be the best way to annotate this using word2features?
Hello, I am trying to perform an NER experiment on a custom dataset containing a lot of food items. I have labels for certain unigrams and bigrams for my training data.
My label corpus contains "green chilli" = "vegetable". I don't have "chilli" as a label I am using this label list in order to annotate sentences for NER.
For example:
A sentence might contain a bigram such as "green chilli" with it's associated label = "vegetable"
Currently while generating the features, I am marking both "green" and "chilli" as "vegetable". My annotation pipeline is as follows:
As a result of point number 4, both green and chilli get marked as vegetable
So when I train my model and run inference on a test sentence containing "green chilli", I would get "vegetable", "vegetable" twice.
What would be the best way to annotate this using word2features?