Closed alexmro closed 8 months ago
You could probably make something like mitie that used the same features to do that in an ok way. But mitie itself is setup to pick out specific sequences of words, which isn't quite what you want. Since you want to instead identify locations between words.
Might be ok anyway to try to force mitie to do it. IDK. But I would either get an open source LLM model or train a little binary SVM with a window of mitie's word features. A LLM would be way better. But way more computationally expensive though. Depends on the kinds of trades you want to make.
I looked inside some python packages that train a bert model so that it can be used to identify the words that need to have certain punctuation marks before them. They use labels like
'.O', '!O', ',O', ':O', ';O',
for that. I suppose you know what I mean.I wonder if the MITIE models can be trained in the same way, that is, if custom labels like these can be used to create entities and if they can give promising results when extracting the information later in order to restore the punctuation. Of course, taking into account that the training material has to be well prepared and optimized for the trainer.