microsoft / AzureSearch_JFK_Files

This repo contains the sample code of the Azure Search and Cognitive Services used to provide insights and analysis around the JFK Files.
MIT License
388 stars 225 forks source link

Sideways/slanted text highlighting #89

Closed emaadparacha closed 3 years ago

emaadparacha commented 4 years ago

When I search for a specific text, usually the results come with that text highlighted on the specific document. I understand this is achieved with hOCR, but it only highlights text that is horizontal. How can I have highlight enabled (or is it possible) if there is slanted text or sideways text? I can still search for that text, and it shows in the transcript, but there is no highlight on the document itself. Is that possible?

Careyjmac commented 4 years ago

It is probably just that the hOCR implementation that is returned by the custom hOCR skill doesn't support non-horizontal text. I unfortunately don't know a lot of the details of that implementation, but you could see if there can be an adjustment made to support non-horizontal text as well. The OCRSkill should still return the bounding boxes, we just probably don't translate that well enough to the HTML for the text highlighting via hOCR,

emaadparacha commented 4 years ago

Gotcha. Was the custom hOCR skill taken from https://github.com/Azure-Samples/azure-search-power-skills/blob/master/Vision/HocrGenerator ? That way maybe a deeper dive could be done to adjust to support non-horizontal text

Careyjmac commented 4 years ago

The JFK sample actually came first and enough people were interested in hOCR that we ported the skill over to the power skills repo that you linked as well. So any additional solution you may want to create to add the non-horizontal capability would probably be preferred there first but we would also probably want to implement it here.

Careyjmac commented 3 years ago

Closing due to inactivity