robertknight / ocrs

Rust library and CLI tool for OCR (extracting text from images)
Apache License 2.0
1.09k stars 44 forks source link

Revise `OcrEngine::prepare_input` API to reduce copies when loading image #56

Closed robertknight closed 4 months ago

robertknight commented 4 months ago

The steps to load an image were:

  1. Load the image into an RGB image::ImageBuffer, which holds bytes in channels-last (HWC) layout.
  2. Copy image bytes from source into an RGB float tensor in channels-first (CHW) layout with values in [0, 1].
  3. Copy values into greyscale CHW float tensor with values in [-0.5, 0.5]

Step (2) is wasteful, especially for large images, and the implementation also unnecessarily allocated zeroed output buffers for steps 2 and 3.

This commit revises the OcrEngine::prepare_input API so that it can accept inputs as either floats or bytes, and in either CHW or HWC order. This enables fusing steps 2 and 3 together, avoiding a copy.

For the convenience of the common use case of passing an image loaded using the image crate, there is also an ImageSource::from_bytes(buffer, dimensions) API. This will also help many consumers avoid the rten-imageio dependency (see also https://github.com/robertknight/rten/issues/39).

Tested on a large JPEG image (2028 x 3306) this reduced image loading time from ~200ms to ~150ms, of which about 128ms is spent in the image crate.

TODO: