Open rudyrdx opened 11 months ago
I believe the error is coming up because the ner_tags
actually need to be ints. The error you see usually comes up because PyTorch encounters an indexing mismatch.
I had some trouble with this myself and found that mapping the string ner_tags
to an ID fixed the issue.
When you instantiate a SpanMarker model, the config already creates this map for you, using the list of labels you provide. You can see it by calling model.config.__getattribute__("encoder")]"label2id"]
.
@tomaarsen I think this should be explicitly mentioned somewhere in the repo since the errors don't make it clear what's gone wrong when integers aren't provided.
@jackboyla Thanks for letting me know I will try
Hi, I went through my training data again and noticed that the spans were wrong. when I divided the data using word length, and then tried to generate ner tags for the respective sentences, the spans were not correct. the startings and endings NER tags were wrong for the sentences. apparantly I lack the brain power to think so I switched to Spacy and was able to achieve the ner (not token classification but sentence classification (paragraph)) . So now i want to try this with SpanMarker so i will update after trying whether the problem was numeric ids or somthing else.
Gives this error:
FineTune Code:
Dataset Sample: