Revise `OcrEngine::prepare_input` API to reduce copies when loading image

The steps to load an image were:

Load the image into an RGB image::ImageBuffer, which holds bytes in channels-last (HWC) layout.
Copy image bytes from source into an RGB float tensor in channels-first (CHW) layout with values in [0, 1].
Copy values into greyscale CHW float tensor with values in [-0.5, 0.5]

Step (2) is wasteful, especially for large images, and the implementation also unnecessarily allocated zeroed output buffers for steps 2 and 3.

This commit revises the OcrEngine::prepare_input API so that it can accept inputs as either floats or bytes, and in either CHW or HWC order. This enables fusing steps 2 and 3 together, avoiding a copy.

For the convenience of the common use case of passing an image loaded using the image crate, there is also an ImageSource::from_bytes(buffer, dimensions) API. This will also help many consumers avoid the rten-imageio dependency (see also https://github.com/robertknight/rten/issues/39).

Tested on a large JPEG image (2028 x 3306) this reduced image loading time from ~200ms to ~150ms, of which about 128ms is spent in the image crate.

TODO:

[x] Consider replacing panicking ImageSource APIs with ones that return a Result

robertknight / ocrs

Revise `OcrEngine::prepare_input` API to reduce copies when loading image #56