mit-nlp / MITIE

MITIE: library and tools for information extraction
2.92k stars 537 forks source link

xpaths for in-browser display #28

Closed johnrfrank closed 8 years ago

johnrfrank commented 8 years ago

For display of highlights natively in a web browser as a user visits pages in the wild, it is necessary to pass XPath-based offsets to the highlighting JavaScript code running in the browser. It doesn't appear that MITIE generates XPath offsets currently. This ticket is a feature request for adding such output to MITIE.

See here for further details:

It is possible to wrap a tool like MITIE with a converter that reparses the HTML and generates XPath offsets. The python example linked below works in more than half of web pages, but has issues with incomplete tags. We have experience solving this issue in some other contexts. If this feature request gets prioritized by MITIE, we would be glad to help MITIE engineers.

https://github.com/trec-kba/streamcorpus-pipeline/blob/master/streamcorpus_pipeline/offsets.py#L341-L425

arjunmajum commented 8 years ago

This sounds like a useful feature, but falls outside of scope for the MITIE project. The design goal is to provide a core functionality for named entity recognition. The user is responsible for managing the input text and translating the output.