microsoft / vert-papers

This repository contains code and datasets related to entity/knowledge papers from the VERT (Versatile Entity Recognition & disambiguation Toolkit) project, by the Knowledge Computing group at Microsoft Research Asia (MSRA).
MIT License
266 stars 93 forks source link

The result of decoding BPE #61

Open temav opened 1 year ago

temav commented 1 year ago

Hello! Could you help me understand the following output? I passed these tags as query label to DecomposedMetaNER: ['action', 'action', 'O', 'entity', 'O', 'O', 'O', 'action', 'O', 'property', 'entity', 'O', 'O', 'property', 'entity', 'O', 'O', 'property', 'O', 'entity', 'O', 'O'] And after applying convert_bpe I have word indexes: [[0, 1], [3, 4], [7, 8], [9, 9], [10, 11], [13, 13], [14, 15], [17, 17], [19, 20]] What is the logic of pairs? Why can I get [i, i] or [i, i+1] for some single words?

temav commented 1 year ago

The end goal was to get model prediction in conll or jsonl format

temav commented 1 year ago

It seems you have a bug at L823 You should replace bisect.left with bisect.right, am I right?

iofu728 commented 1 year ago

It seems you have a bug at L823 You should replace bisect.left with bisect.right, am I right?

Hi @temav, I think you are right. However, the function convert_bpe has been deprecated. We did not use this function in our experiments. And we will fix this issue in the next PR, thank you!

temav commented 1 year ago

I see, thanks!