Open zuphilip opened 8 years ago
This could certainly be done. I'd favor a solution in Javascript though, to be more flexible. And it looks like a fun project, too. I'm currently focussing on validating hocr docs with schematron] (to augment hocr-check
), so not sure when there will be time to work on this.
i can help about schematron related work
@wanghaisheng Great, help very welcome! Let's discuss in the hocr-spec gitter, I'll explain what I've done so far later tonight.
Any news on this?
I'm working on it (https://github.com/kba/hocrjs) but at the moment I focus on the hOCR spec to get the implementation right.
There's also @jbaiter's hocrviewer-mirador which requires setup but has a great interface.
I found https://github.com/ultrasaurus/hocr-javascript which is an approach to overlay the OCR data on the picture by using JavaScript, see e.g. http://rawgit.com/ultrasaurus/hocr-javascript/master/letter.html .
Any progress here in the last years? Would would be the state of the art tool to present text and images?
Thanks!
Have you tried https://github.com/kba/hocrjs? Otherwise, you can convert to another format like PAGE-XML and use PAGEViewer or Aletheia.
PAGEViewer already supports hOCR, so no need for a conversion.
PAGEViewer is standalone application. I am looking for a solution to display image and text as a synptic view on a web page. So far I wanted to use hOCR as data format, but if there are better solutions for another format for such a web representation, I reconsider the decision and take another file format.
What would you suggest?
Then hocrjs could be a good starting point for you.
The
hocr
files are alreadyhtml
files and can be displayed in any browser. However, they will just display the text without any layout or format information. What do you think about doing some HTML exporter which will display also some of the layout or format information? With thebbox
we can show the text at the correct position, see also https://github.com/tmbdev/ocropy/issues/80#issuecomment-177227732