mindee / doctr-tfjs-demo

Javascript demo of docTR, powered by TensorFlowJS
94 stars 20 forks source link

bug: :bug: batch text recognition model #17

Closed charlesmindee closed 2 years ago

charlesmindee commented 2 years ago

For now, the text recognition model takes as input a single image:

export const getImageTensorForRecognitionModel = (
  imageObject: HTMLImageElement
) => {
  let tensor = browser
    .fromPixels(imageObject)
    .resizeNearestNeighbor([32, 128])
    .toFloat();
  let mean = scalar(255 * REC_MEAN);
  let std = scalar(255 * REC_STD);
  return tensor.sub(mean).div(std).expandDims();
};

We need to prepare crops as lists of 32 or 64 crops (for instance), and concatenate those lists so that the recognition model receive tensors of shapes [32, 32, 128, 3] or [64, 32, 128, 3] instead of [1, 32, 128, 3]. This will speed up the model a lot.

The .expandDims() is adding the first dimension to the image, instead of that we should use tf.concat() on the list of images to directly concatenate those images in a 4D tensor.

charlesmindee commented 2 years ago

this is fixed since #30