richardpaulhudson / holmes-extractor

Information extraction from English and German texts based on predicate logic
MIT License
134 stars 12 forks source link

Using subheads for context #8

Closed NixBiks closed 2 years ago

NixBiks commented 2 years ago

I'm wondering if this is able to capture context from subheads. Take this example (taken from here):

Relais Group Plc publishes preliminary unaudited financial information for the first half of the financial year 2022, as the profitability outcome was significantly below the level of the comparison period H1 2021.

APRIL-JUNE 2022 HEADLINE FIGURES

  • Net sales totaled EUR 58.6 million (January – June 2021: 52.2), +12.2% change
  • EBITDA was EUR 4.2 (5.4) million, 7.2% (10.3%) of net sales, -21.3% change

Let's say I wanted to extract two facts from this

  1. metric: Net sales - value: EUR 58.6 million - period: April-June 2022
  2. metric: EBITDA - value: EUR 4.2 million - period: April-June 2022

I have a NER model that can mark all metrics, values and periods just fine. And with Holmes I see how I would be able to extract the metric and value - however I'm not sure how to connect the period (given from the subhead) to the facts. Is this possible with the current version of Holmes?

richardpaulhudson commented 2 years ago

This is not something that Holmes would support out of the box, but the match() and topic_match_documents_against() methods return a lot of context information including the indexes of the spaCy tokens that matched. You can call Holmes from your own code and then use this information to drive such more specific tasks.