ocropus-archive / DUP-ocropy

Python-based tools for document analysis and OCR
Apache License 2.0
3.41k stars 590 forks source link

Updating the wiki #256

Closed urhub closed 6 years ago

urhub commented 6 years ago

I am spending some hours on a class project to update Ocropy documentation. I would like to collect requests for documentation you want to see in the Wiki.

Possible Solution

Please add comments here what you would like to see added or changed to the documentation.

zuphilip commented 6 years ago

The training in ocropus is currently not documented and it would be great if you can start for that with a new wiki page for that. The recognition is only sparsely described https://github.com/tmbdev/ocropy/wiki/Text-Recognition-RNN and any improvement there would be great as well.

urhub commented 6 years ago

Thanks, @zuphilip . I will produce a LibreOffice document first and then look into updating the Wiki. I am hoping to structure the document according to the data flow, roughly:

Introduction Ground Truth Training Evaluation Recognition Results

uvius commented 6 years ago

There is a 25 page document from 2015 detailing just that, and it is linked on the Wiki page: How to train your own models (you can ignore the reference to the wapper ocrocis and look at the explanations for ocropy).

urhub commented 6 years ago

Yes, I have seen the Ocrocis document before. Thanks for the reminder, I will take a look at that again.

urhub commented 6 years ago

My document so far is looking more like a tutorial on tips to handle Ocropy. I am hoping to add a quiz for self assessment. I have avoided writing about the algorithms since I don't know which ones are in the latest version and there are other journal papers and thesis works that talk about them. I have borrowed material from other websites and acknowledged them in the Bibliography, including Breuel's block diagrams. I got a request from my class to use a Creative Commons License so that it can be used freely by others and be part of Open Educational Resources.

zuphilip commented 6 years ago

This sounds interesting! I am looking forward to have a look at it.

urhub commented 6 years ago

@zuphilip I can send you a copy early if you would like to proof read. There is not a whole lot to it really.

urhub commented 6 years ago

Hi all, I have uploaded my document at https://github.com/digiah/oldOCR/blob/master/ocropy_getting_started.pdf. Let me know of any corrections or omissions and I will fix it. Thanks.

zuphilip commented 6 years ago

At section 7 Recognition:

VUE Workflow diagram:

urhub commented 6 years ago

Thanks for those comments, Philipp. Sorry for the delay in making the corrections. I have updated the file with the information you provided and edited the Bibliography on the VUE diagram.