Feature Request: Support for 16 bit integer or 32 bit float single channel images

j99ca commented 6 months ago

My team and I have been using OpenVINO for a couple of years now, deploying our converted Tensorflow models to the edge as IR Format models. Recently I have been trying to move towards using Pytorch, and specifically this library. It's been great for our RGB datasets, but we often deal with thermal and depth image data, and other scientific image formats.

With our Tensorflow based object detection training, we are able to work around the dataset type differences, but this library seems very set on RGB 8 bit data. We have often treated our thermal data as grayscale when fine-tuning RGB based classification or object detection models, but we can keep our precision high in our due to using 32 bit float values and interpreting between [0.0, 255.0]. The 8 bit limitation of this library means we can only have 256 temperature values being represented, without creating some palette transformation (non-grayscale mapping) which is not ideal in our use case.

A feature I would like to see is easy exposure and overriding image loading so we can load our images (non-RGB data) and transform them to [0.0, 255.0] floating point images. Even better would be the capability to override the model architecture to add some new input layers such that the model itself can take, say images in kelvin (single channel 32 bit float), and apply our custom transformations in-model so we can export the model to the IR format and keep the single channel -> 3 channel transformation and ensure portability when we deploy these models to the edge, where they will be ingesting images in kelvin (32 bit float single channel).

Thanks for the great work by the way. We are currently are using 1.6 but I hope to switch to 2.0 when it's ready. Speaking of 2.0, will it keep the ability to use custom/override image input sizes? That is a great feature for us.

harimkang commented 6 months ago

@j99ca Thanks for suggestion!

@kprokofi @goodsong81 Aside from feature suggestions, we haven't yet added configurable input size functionality in 2.0 - it's in the backlog, of course, but we don't have an estimate of when it will be added yet. Shouldn't we be talking about when this feature is coming in?

eunwoosh commented 5 months ago

Hi @j99ca, As Harim said, configurable input size won't be included in OTX 2.0 release, but we have a plan to enable it later. When it's enabled isn't decided but it will be enabled not long after 2.0 release.

j99ca commented 5 months ago

@harimkang @eunwoosh understandable about configurable input size and timelines. How about the original request? This (and the configurable resolution) are the biggest barriers for our full adoption of this library. As I mentioned above, most of our installations are done with 16 bit single channel or floating point images.

Even just exposing the reading and decoding of the images so we can pass a custom function to the engine or dataloader so we can read our scientific image data directly and do the mapping to [0.0, 255.0] or [0.0, 1.0] float input that the models use (not uint8 but float32) and not lose detail as we could represent millions of unique temperature values within those bounds

harimkang commented 5 months ago

@wonjuleee What do you think of this feature suggestion?

wonjuleee commented 5 months ago

Hi all, as far as I guess, this sounds good for handling more precise data types in OTX. But, this requires to modify both Datumaro and OTX. Since Datumaro image loader (laze_image) can set dtype for image decoding as described in https://github.com/openvinotoolkit/datumaro/blob/6a05715147b632442f63a0f669b7cf1c7e0c5a87/src/datumaro/util/image.py#L359, we need to slightly modify Datumaro ImageFromFile and this could be controlled through https://github.com/openvinotoolkit/training_extensions/blob/af03ae08aff23597996b9a93a05d21f5d3b8ff60/src/otx/core/data/dataset/base.py#L150 in OTX side. This will be configured through something like data.config.dtype in the recipe. I think this might be feasible for the next Datumaro/OTX version. @kprokofi, what do you think?

j99ca commented 4 months ago

@wonjuleee I was wondering if there was any progress on this feature? I see that the datumaro PR for adding dtype got merged: https://github.com/openvinotoolkit/datumaro/pull/1546

I am very much looking forward to this feature! High precision data types is quite critical for us.

j99ca commented 1 month ago

@sovrasov @harimkang I see the label OTX 2.0 was removed, is this a feature that is still being considered? Support for Single channel 16 bit PNGs (without crushing it down to uint8 before converting to the model's native float32) or TIFF float32 images would amazing and allow us to directly use our scientific data with this excellent library

harimkang commented 1 month ago

@sovrasov @harimkang I see the label OTX 2.0 was removed, is this a feature that is still being considered? Support for Single channel 16 bit PNGs (without crushing it down to uint8 before converting to the model's native float32) or TIFF float32 images would amazing and allow us to directly use our scientific data with this excellent library

@kprokofi Can you answer this?

sovrasov commented 1 week ago

@j99ca sorry for a long reply, we're going through some org changes now. OTX 2.0 label was removed, since it makes no sense anymore: 2.1 version was released, and 1.6.x is no longer supported. datumaro with variable dtype is already released, so technically the feature can be implemented in OTX, but we don't have resources right now. It looks we need new transforms in OTX to support high precision images: our current augmentation pipeline supports uint8 dtype only. I'll double-check if there is a quick way to forward dtype to OTX recipe, but it's unlikely that high-precision transforms are going to be implemented by the core OTX team.

j99ca commented 1 week ago

@sovrasov Thank you looking into this!

openvinotoolkit / training_extensions

Feature Request: Support for 16 bit integer or 32 bit float single channel images #3568