converting predicted (subtoken) output to normal text

mandarjoshi90 / coref

BERT for Coreference Resolution

Apache License 2.0

440 stars 92 forks source link

converting predicted (subtoken) output to normal text #82

Closed ambernorder closed 3 years ago

ambernorder commented 3 years ago

Hello,

When the model has predicted something the text is tokenized (The "##" in the sentences). How can you convert this, togehter with the coreference clusters, back to normal text?

linguist89 commented 3 years ago

If you take a look at the notebook that is linked in the repo's README, you will see there is a section where the author converted the output back to readable English sentences. Work through that code and you will be able to convert your output back to readable text.