Closed debanjanbhucs closed 2 years ago
@ad6398 The model files used for prediction can be obtained over here - https://huggingface.co/dmahata/dlkp_test
Hey @debanjanbhucs , I printed each token and its tag. As guessed, the issue is not with the decoding algorithm but with the model. It is not trained too well to identify masking strategies
as a single keyphrase. The file attached here has tokens and their tag predicted by the model, we can clearly see that mask
has B
tag, ing
has B
and strategies
has I
tag. The same goes for other KPs
token_tag.txt
The model prediction seems to have a bug and does not properly deal with the sub-words in the output.
For example this is the output obtained:
After executing the following code:
As can be seen in the output keyphrases 'mask' and 'ing strategies' are treated as separate keyphrases. This seems like a bug while putting together the sub-words during formatting the prediction output.