kermitt2 / grobid-ner

A Named-Entity Recogniser based on Grobid.
https://grobid-ner.readthedocs.io
Apache License 2.0
49 stars 11 forks source link

[annotations] Deleting of wikipedia references has been too "generous" #52

Closed lfoppiano closed 7 years ago

lfoppiano commented 7 years ago

Guys, I was looking at the file holocaust 1 and I've noticed some missing sentences, then I went looking into the details and seems that issue is in commit ebc17be37cf7a81a825f6a700230de267588de55.

When deleting the references from the text it is intended only to delete the number [1] for example and not the whole sentence, unless it's the list of bibliographical references at the end of the article, in that case they have to be removed completely.

I've moved Holocaust.1. under staging for the time being.

everzeni commented 7 years ago

Hi, the sentences are back in the file. It was not intentional to delete them in the first place (more like a regex problem).

everzeni commented 7 years ago

I reopen briefly this issue to say I deleted a [citation needed] which I thought was like a reference. Is that ok?

lfoppiano commented 7 years ago

I think is fine because it's not part of the natural language text

everzeni commented 7 years ago

ok I think too, it's not very different from a number referring to an actual citation