Named entity score. - Githubissues

ChunchuanLv commented 5 years ago

Hi Sheng,

Thanks for your nice work. The conversion to dependency parsing is really insightful.

One question, I find your named entity metric to be low, and yet have a wikification accuracy higher than that of named entity score. Do you know what is happening? Are you having low named entity recall or there are issues with surface string processing?

Chunchuan

sheng-z commented 5 years ago

Hi Chunchuan,

The NER score is lower than yours, because the NER step (i.e. "United States" -> country) is done in preprocessing, which are just simple rules observed based on the training data.

Wikification in contrast is done by the powerful DBpedia Spotlight API. It directly works on the input text, and is therefore not impacted by NER. I think this is the reason why the wikification score is higher than NER.

Replacing my current rule-based NER with a SOTA NER tagger would probably further improve the final result.

ChunchuanLv commented 5 years ago

Thank you for your quick response. Another question, is there any technical reason that you don't use AllenNLP as it is? I'd like to adopt your code to pytorch 1.0 and the updated AllenNLP, Bert implementation. Do you think there will be any problem?

Chunchuan

On Thu, 18 Jul 2019 at 16:49, Sheng Zhang notifications@github.com wrote:

Hi Chunchuan,

The NER score is lower than yours, because the NER step (i.e. "United States" -> country) is done in preprocessing, which are just simple rules observed based on the training data.

Wikification in contrast is done by the powerful DBpedia Spotlight API. It directly works on the input text, and is therefore not impacted by NER. I think this is the reason why the wikification score is higher than NER.

Replacing my current rule-based NER with a SOTA NER tagger would probably further improve the final result.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/sheng-z/stog/issues/2?email_source=notifications&email_token=AA5TK3IRLMITDIMHO7UGWDLQADJPJA5CNFSM4IE6S4P2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD2JXYKQ#issuecomment-512982058, or mute the thread https://github.com/notifications/unsubscribe-auth/AA5TK3OZEQCJPF7ERGX4563QADJPJANCNFSM4IE6S4PQ .

sheng-z commented 5 years ago

No technical reason. There was just some internal story behind it. Anyway, moving it into AllenNLP is completely doable and appreciated ;)

ChunchuanLv commented 5 years ago

Cool, thanks.

Chunchuan

On Sat, 20 Jul 2019 at 15:45, Sheng Zhang notifications@github.com wrote:

No technical reason. There was just some internal story behind it. Anyway, moving it into AllenNLP is completely doable and appreciated ;)

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/sheng-z/stog/issues/2?email_source=notifications&email_token=AA5TK3NWPTHUTGVAVYHAC7TQANTNXA5CNFSM4IE6S4P2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD2NUYYY#issuecomment-513494115, or mute the thread https://github.com/notifications/unsubscribe-auth/AA5TK3MTVOJ2OSBHBIL3UJDQANTNXANCNFSM4IE6S4PQ .

sheng-z / stog

Named entity score. #2