For now, the text recognition model takes as input a single image:
export const getImageTensorForRecognitionModel = (
imageObject: HTMLImageElement
) => {
let tensor = browser
.fromPixels(imageObject)
.resizeNearestNeighbor([32, 128])
.toFloat();
let mean = scalar(255 * REC_MEAN);
let std = scalar(255 * REC_STD);
return tensor.sub(mean).div(std).expandDims();
};
We need to prepare crops as lists of 32 or 64 crops (for instance), and concatenate those lists so that the recognition model receive tensors of shapes [32, 32, 128, 3] or [64, 32, 128, 3] instead of [1, 32, 128, 3]. This will speed up the model a lot.
The .expandDims() is adding the first dimension to the image, instead of that we should use tf.concat() on the list of images to directly concatenate those images in a 4D tensor.
For now, the text recognition model takes as input a single image:
We need to prepare crops as lists of 32 or 64 crops (for instance), and concatenate those lists so that the recognition model receive tensors of shapes [32, 32, 128, 3] or [64, 32, 128, 3] instead of [1, 32, 128, 3]. This will speed up the model a lot.
The
.expandDims()
is adding the first dimension to the image, instead of that we should usetf.concat()
on the list of images to directly concatenate those images in a 4D tensor.