Open dcloud opened 10 years ago
Looking at https://gist.github.com/dcloud/9173113, fwiw
Basics done in 9036cdc651f497e72383c597d23e19ec46095c0d.
Note that spans for words are sometimes ocrx_word and sometimes just ocr_word -- in other words, the x is sometimes missing.
Ah, I wasn't sure about that. Reopening. Do you know the difference (what the x means)?
I dunno. I'm not sure it's intentional or a bug. But since I ran into this I've gotten more skeptical about how tight the spec is...
On Mon, Feb 24, 2014 at 12:22 PM, Daniel Cloud notifications@github.comwrote:
Ah, I wasn't sure about that. Reopening. Do you know the difference (what the x means)?
Reply to this email directly or view it on GitHubhttps://github.com/pdfliberation/python-hocrgeo/issues/1#issuecomment-35910887 .
Yeah, so fwiw ocrx_word might not be a formal part of the spec -- this doc https://docs.google.com/document/d/1QQnIQtvdAC_8n92-LhwPcjtAUFwBlzE8EWnKAxlgVf0-- I'm not totally sure of how authoritative it is -- describes it as being part of the 'engine-specific markup'. Which gives me pause...
On Mon, Feb 24, 2014 at 12:22 PM, Daniel Cloud notifications@github.comwrote:
Reopened #1 https://github.com/pdfliberation/python-hocrgeo/issues/1.
Reply to this email directly or view it on GitHubhttps://github.com/pdfliberation/python-hocrgeo/issues/1 .
Found existing one in Github, but it didn't work. See if we can quick write one of our own.