Open bertsky opened 2 years ago
The code base of Origami at this point is very static, I don't see any changed in the foreseeable future. Having an OCR-D implementation would make the OCR-D version the de facto version of Origami.
The first attempt of segment.py
takes the exactly right approach IMHO. Instead of wrapping Origami's Processor
and Origami's own artifacts logic, it seems more sane to take the approach you have taken, i.e. to focus on each processor in turn and build a new processor logic around it.
To answer (2), the file-based approach is not simple to replace. The easiest and best approach to take is the one you have taken: to start at a processor class and rewrap the items that come into the process
method as arguments.
There is no simple answer to (3). There have to be wrappers for importing and exporting Origami's internal formats (mostly JSON) to some external format. The most difficult aspect is probably that stages later in the pipeline need various inputs from earlier stages.
I think we should set up a video call to go over the main issues.
Dear @poke1024, may I enquire your thoughts on how to best achieve/approach any or all of the following with Origami:
As you know, I want to build an OCR-D wrapper for Origami – for which (to do it right) I probably need most of the above, but I certainly don't want to either require effort on your side, or risk having to rewrite everything every now and then as your code base evolves.
My first (unfinished) attempt was too ambitious for sure: https://github.com/bertsky/ocrd_origami/blob/master/ocrd_origami/segment.py
(I'll rethink how I can live without 2 but still achieve some 3, but your advice would be much appreciated.)