sciencehistory / scihist_digicoll

Science History Institute Digital Collections
Other
13 stars 0 forks source link

Fix newspaper OCR for an issue of the Carlsbad Current Argus #2745

Closed eddierubeiz closed 1 month ago

eddierubeiz commented 2 months ago

Our OCR software ran out of memory attempting to run OCR on this newspaper.

See https://app.honeybadger.io/projects/58989/faults/82650187#notice-context for the HB error.

jrochkind commented 2 months ago

Thanks! this is a thing known to happen with very large/complex photos. I think our procecure is run ocr manually on a developer mac, and upload it manually? This is prob documented in wiki somewhere? We should link wiki with docs here -- or make wiki with docs!

(Was it all 6 pages or just some/one page?)