tmbdev / clstm

A small C++ implementation of LSTM networks, focused on OCR.
Apache License 2.0
821 stars 224 forks source link

using pyrnn.gz in clstm #111

Open srika91 opened 7 years ago

srika91 commented 7 years ago

How to use the pyrnn.gz models created in ocropy for prediction in clstm, as clstm prediction seems faster than the ocropy's prediction?

jbaiter commented 7 years ago

I don't think it's possible, since pyrnn and clstm use different model definitions:

https://github.com/tmbdev/clstm/blob/master/clstm.proto https://github.com/mittagessen/kraken/blob/master/proto/pyrnn.proto

Maybe there's a way to convert between the two, but I wouldn't know how :/

zuphilip commented 7 years ago

I think to remember @tmbdev mentions somewhere that one has to train the models for CLSTM again from the GT, i.e. they might not really be convertible.

kba commented 7 years ago

Have not tried it but there is https://github.com/naptha/ocracy/blob/master/ocropy/pyrnn2clstm.py

jbaiter commented 7 years ago

That script converts to the old HDF5-based format, not the new Protobuf-based one, unfortunately :-/ I just had a look at two protobuf models from clstm and from kraken (the fraktur one, which was converted from a pyrnn model). It looks like the ocropy-model has more parameters/weights in the LSTM layers than the clstm-model: They share wci, wgi, wgf, wgo, but the ocropus model has wip, wfp, wop in addition. I doubt that just putting the four matching weight matrices for each layer into a clstm protobuf file would work, since those weights were conditioned on different architectures, but I'd love to be proven wrong :-) Also, iirc clstm uses a different line normalization algorithm than ocropus, i.e. for identical line images the two models were conditioned on different inputs, though I don't know how much the difference matters in practice.

amitdo commented 7 years ago

It looks like the ocropy-model has more parameters/weights in the LSTM layers than the clstm-model: They share wci, wgi, wgf, wgo, but the ocropus model has wip, wfp, wop in addition.

In clstm the peephole optimization code was dropped. https://github.com/tmbdev/clstm/issues/17#issuecomment-111535495 In ocropy it's still present.

mittagessen commented 7 years ago

They are for all intents and purposes completely different networks because of the peephole connections (so not really convertible). The code linked above only reserializes pickled pyrnn into HDF5 or protobuf files as they are vastly smaller (~1000 times without compression), faster to parse, and not an inherent security risk. A HDF5 or pronn model is still not a CLSTM model but an ocropy one with some benefits.

The line normalization and preprocessing is the same for both types of models.

amitdo commented 7 years ago

The line normalization and preprocessing is the same for both types of models.

From ocropy README.md

CLSTM vs OCRopy

....

Python and C++ models can not be interchanged, both because the save file formats are different and because the text line normalization is slightly different.

mittagessen commented 7 years ago

The line image normalization is identical, the text line normalization is not. Ocropy normalizes output to NFKC(/D?), clstm doesn't normalize output to any Unicode normalization form.

DissBiscuit commented 5 years ago

@jbaiter sorry to open old closed subjects, but i am currently working on kraken, especially this fraktur model, and i understand you worked on it too ? is it a dead end ? I'm trying to see if it does a better job than tesseract... the output I get with kraken -i imagefilename.tif outputfilename.xml binarize segment ocr -a -m fraktur.pronn on ubuntu python 2.7.15 looks like it's in the wrong format... thanks in advance !