Closed jbreiden closed 7 years ago
Emergency workaround while I go hunt down root cause.
--- tesseract/api/pdfrenderer.cpp 2016-11-21 08:45:47.000000000 -0800
+++ tesseract/api/pdfrenderer.cpp 2016-12-05 14:15:42.000000000 -0800
@@ -841,8 +841,8 @@
bool TessPDFRenderer::AddImageHandler(TessBaseAPI* api) {
size_t n;
char buf[kBasicBufSize];
- Pix *pix = api->GetInputImage();
char *filename = (char *)api->GetInputName();
+ Pix *pix = pixRead(filename);
int ppi = api->GetSourceYResolution();
if (!pix || ppi <= 0)
return false;
This change also does it, at the cost of memory. And probably leaks.
--- tesseract/api/baseapi.cpp 2016-12-05 08:51:32.000000000 -0800
+++ tesseract/api/baseapi.cpp 2016-12-05 14:47:16.000000000 -0800
@@ -523,7 +523,7 @@
if (InternalSetImage()) {
thresholder_->SetImage(imagedata, width, height,
bytes_per_pixel, bytes_per_line);
- SetInputImage(thresholder_->GetPixRect());
+ SetInputImage(pixCopy(NULL, thresholder_->GetPixRect()));
}
}
@@ -545,7 +545,7 @@
void TessBaseAPI::SetImage(Pix* pix) {
if (InternalSetImage()) {
thresholder_->SetImage(pix);
- SetInputImage(thresholder_->GetPixRect());
+ SetInputImage(pixCopy(NULL, thresholder_->GetPixRect()));
}
}
This one is probably best.
--- tesseract/ccmain/thresholder.cpp 2016-03-11 14:29:36.000000000 -0800
+++ tesseract/ccmain/thresholder.cpp 2016-12-05 15:00:46.000000000 -0800
@@ -225,7 +225,7 @@
Pix* ImageThresholder::GetPixRect() {
if (IsFullImage()) {
// Just clone the whole thing.
- return pixClone(pix_);
+ return pixCopy(pix_);
} else {
// Crop to the given rectangle.
Box* box = boxCreate(rect_left_, rect_top_, rect_width_, rect_height_);
@@ -322,4 +322,3 @@
}
} // namespace tesseract.
-
This bug happens when:
So for example, this example is TIFF G4. Converting to an identical looking TIFF LZW grayscale does not tickle this bug.
Ray found the exact spot. This is the final answer.
--- tesseract/ccmain/thresholder.cpp 2016-03-11 14:29:36.000000000 -0800
+++ tesseract/ccmain/thresholder.cpp 2016-12-05 15:27:45.000000000 -0800
@@ -181,8 +181,9 @@
// Caller must use pixDestroy to free the created Pix.
void ImageThresholder::ThresholdToPix(PageSegMode pageseg_mode, Pix** pix) {
if (pix_channels_ == 0) {
- // We have a binary image, so it just has to be cloned.
- *pix = GetPixRect();
+ // We have a binary image, so it just has to be copied.
+ // Don't clone or you'll mess up api->GetInputImage()
+ *pix = pixCopy(NULL, GetPixRect());
} else {
OtsuThresholdRectToPix(pix_, pix);
}
@@ -322,4 +323,3 @@
}
} // namespace tesseract.
-
Note that this bug affects all versions of Tesseract capable of producing PDF output, both 3.0.x and 4.x.
... And the code above is leaky. Ray is doing the final final final version right now.
Fixed in commit 7744da9..025689f.
This means api->GetInputImage() is giving us a processed image.
test.tif.zip test.pdf