Closed dioubernardo closed 2 years ago
In this PDF https://iase-web.org/documents/papers/icots5/Topic1m.pdf
That's because there's a missing space in the input. The parser splits tokens on spaces, so the 'The' is part of date segment (and dropped during normalisation).
In this PDF https://iase-web.org/documents/papers/icots5/Topic1m.pdf