tesseract-ocr / tesseract

Tesseract Open Source OCR Engine (main repository)
https://tesseract-ocr.github.io/
Apache License 2.0
62.43k stars 9.53k forks source link

Application using TessBaseAPI crashing due to copying issues #874

Closed gaussianrecurrence closed 5 years ago

gaussianrecurrence commented 7 years ago

Well, I'm trying to implement an application using tesseract API, but so far I couldn't get it working as it crashes always when trying to clean-up all the memory.

After a few hours of dissasembly I've come to the conclusion that the problem is due to TessBaseAPI not having copy-constructor or assignation operator.

I've got a class called OcrEngine, this one creates an instance of TessBaseAPI (let's call it instance T_A), and when I copy an instance of OcrEngine, OcrEngine copy-constructor is called which consequently calls compiler-generated TessBaseAPI copy-constructor, generating a new TessBaseAPI (let's call it instance T_B). As of compiler-generated copy-constructors performs a "fool-copy", both instance T_A and T_B shares the same attributes. And when T_A is destroyed, values of T_B point to non-existant memory regions.


So, if i'm not wrong a simply solution to this problem might be to implement copy-constructors and assignation operators for TessBaseAPI class and the ones instanced by it.

Meanwhile when I make a copy of my class OcrEngine, I just create a new instance of TessBaseAPI and try to replicate all the original TessBaseAPI object configuration.

amitdo commented 7 years ago

http://stackoverflow.com/questions/3422125/why-copy-constructor-and-assignment-operator-are-disallowed

amitdo commented 7 years ago

https://google.github.io/styleguide/cppguide.html#Copyable_Movable_Types

amitdo commented 7 years ago

Bottom line - Don't copy it.

gaussianrecurrence commented 7 years ago

I don't think that's the solution. The reason why I want to use copy constructor is to avoid allocating sparse memory regions. Moreover the main reason why I need to copy then is to avoid having to reconfigure the class many times in case I'd had a OCR thread pool...

In my opinion I don't see why copy constructors/assignation operators are evil. I mean the main idea is to use then but there are lots of situations where you need it. I just think is a dumb reasoning...

Moreover, If only TessBaseAPI were using reference-counting for all its internal objects, I wouldn't need to implement these methods, but that's not the case..

amitdo commented 6 years ago

@stweil, I think copying the api object should be disabled with C++11 =delete.

stweil commented 5 years ago

Please review pull request #874 #2721.

stweil commented 5 years ago

Copying is now no longer possible, so I close this issue. An implementation with reference counting can be implemented later. Personally I think there are other things with a higher priority before that.

stweil commented 5 years ago

The removal of the copy constructor breaks tesserocr which currently uses it, maybe here. I still don't know how to fix that. @sirfz?

amitdo commented 5 years ago

You can try to ask about the needed changes in tesserocr here:

https://groups.google.com/group/cython-users

stweil commented 5 years ago

The solution was surprisingly easy: just delete unneeded code: https://github.com/sirfz/tesserocr/pull/200.