raffaeldantas / tesseract-ocr

Automatically exported from code.google.com/p/tesseract-ocr
Other
1 stars 0 forks source link

OpenCL build - kernel_ThresholdRectToPix() not accounting for padding bits in the output pix?! #1351

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
What steps will reproduce the problem?
1.Use tesseract build with OpenCL.
2.Pass full color image with width which is not multiple of 32.
3.Recognition is way too slow and does not recognize anything.
I read the article on 
http://www.sk-spell.sk.cx/tesseract-meets-the-opencl-first-test and decided to 
give OCL a try. The initial result was as per point 3 above. After some 
debugging I figured the problem is that the OCL version of threshold rect 
generation does not account for padding bits in the output pix lines. To prove 
my discovery I made a quick fix in oclkernels.h replacing the definition of 
kernel_ThresholdRectToPix:
//Changes:
//CHAR_VEC_WIDTH is reduced to 4 in order to be able to address individual 
pixels
//and the accumulated padding for the current line is subtracted from the input 
array index.
KERNEL(
\n#define CHAR_VEC_WIDTH 4 \n
\n#define PIXELS_PER_WORD 32 \n
\n#define PIXELS_PER_BURST 8 \n
\n#define BURSTS_PER_WORD (PIXELS_PER_WORD/PIXELS_PER_BURST) \n
 typedef union {
  uchar s[PIXELS_PER_BURST*NUM_CHANNELS];
  uchar4 v[(PIXELS_PER_BURST*NUM_CHANNELS)/CHAR_VEC_WIDTH];
 } charVec;

__attribute__((reqd_work_group_size(256, 1, 1)))
__kernel
void kernel_ThresholdRectToPix(
    __global const uchar4 *imageData,
    int height,
    int width,
    int wpl, // words per line
    __global int *thresholds,
    __global int *hi_values,
    __global int *pix) {

    // declare variables
    uint pad = PIXELS_PER_WORD * wpl - width;//number of padding bits at the end of each output line
    int pThresholds[NUM_CHANNELS];
    int pHi_Values[NUM_CHANNELS];
    for ( int i = 0; i < NUM_CHANNELS; i++) {
        pThresholds[i] = thresholds[i];
        pHi_Values[i] = hi_values[i];
    }

    // for each word (32 pixels) in output image
    for ( uint w = get_global_id(0); w < wpl*height; w += get_global_size(0) ) {
        unsigned int word = 0; // all bits start at zero
        //decrease the pixel index for the padding at the end of each output line (=number of lines * padding)
        uint pxIdxOffset = ( w / wpl) * pad;// = ( ( PIXELS_PER_WORD * w) / ( width + pad)) * pad;
        // for each burst in word
        for ( int b = 0; b < BURSTS_PER_WORD; b++) {
            // load burst
            charVec pixels;
            for ( int i = 0; i < (PIXELS_PER_BURST*NUM_CHANNELS)/CHAR_VEC_WIDTH; i++ ) {
                pixels.v[i] = imageData[w*(BURSTS_PER_WORD*(PIXELS_PER_BURST*NUM_CHANNELS)/CHAR_VEC_WIDTH) + b* 
((PIXELS_PER_BURST*NUM_CHANNELS)/CHAR_VEC_WIDTH)  + i - pxIdxOffset];
            }

            // for each pixel in burst
            for ( int p = 0; p < PIXELS_PER_BURST; p++) {
                for ( int c = 0; c < NUM_CHANNELS; c++) {
                    unsigned char pixChan = pixels.s[p*NUM_CHANNELS + c];
                    if (pHi_Values[c] >= 0 && (pixChan > pThresholds[c]) == (pHi_Values[c] == 0)) {
                        word |=  (((uint)0x80000000) >> ((b*PIXELS_PER_BURST+p)&31));
                    }
                }
            }
        }
        pix[w] = word;
    }
}

Just a reminder: it is necessary to force OCL kernel recompilation after 
changing this source (e.g. delete “kernel - <device>.bin” from the exec 
folder).
The fix is working but I am not sure about it since the original source 
apparently works for other people (as per the article). If I am right the 
OS/GPU are irrelevant since the bug is algorithmic, but mine are Windows/AMD. 
Also similar fix is applicable to kernel_ThresholdRectToPix_OneChan(), but 
there the input array might have some padding bytes as well, so its indexing 
will need further adjustments. I can come with some prove/fix for it either - I 
have not played with it yet.
Disclaimer: I have no prior experience with image processing and tesseract 
source or with GPU computing and OpenCL (but please do explain if I am wrong).

Original issue reported on code.google.com by zhivko.t...@gmail.com on 22 Oct 2014 at 11:28

GoogleCodeExporter commented 9 years ago
The issue is for version 3.03

Original comment by zhivko.t...@gmail.com on 22 Oct 2014 at 11:37

GoogleCodeExporter commented 9 years ago
[deleted comment]
GoogleCodeExporter commented 9 years ago
Same issue for me - win7, AMD quad core, vs2013. Patch above fixes.

Of note, on same hardware (dual boot) and same test page (2444 x 3504 300 dpi 
jpeg), Ubuntu 4s (sl faster without OpenCl), Windows 10s (without OpenCl) and 
11s with OpenCl.

Original comment by i...@xenocide.org.uk on 3 Nov 2014 at 8:52

GoogleCodeExporter commented 9 years ago

Original comment by zde...@gmail.com on 16 Apr 2015 at 7:27

GoogleCodeExporter commented 9 years ago
moved to github: https://github.com/tesseract-ocr/tesseract/pull/17

Original comment by joregan on 13 May 2015 at 8:26