Clarify transformations for image models at inference time

kevinrobinson commented 5 years ago

Hello!

I think it might be helpful to clarify the transformations that images go through in the docs, and maybe provide a public method that encapsulates that. Here's my current understanding.

1. Types

You can pass a few different types:

export type ClassifierInputSource = HTMLImageElement | HTMLCanvasElement | HTMLVideoElement | ImageBitmap;

2. cropTo

This image data is copied into a new canvas, cropped with cropTo. This sizes to 224x224, and uses a strategy like "cover", resizing the image to be at least 224x224 and then cropping from the center.

3. capture

The call to capture grabs the pixels from the image, and then crops that tensor with cropTensor. This crop enforces that the image is square, but here it doesn't do anything, since the image itself has already been cropped to be square in cropTo. Finally it normalizes the values in RGB space to [-1,1] here.

4.Transparency

It also seems like fully transparent pixels might be translated to rgb(0,0,0) as well. That happened in one example image I tried, but I didn't look further.

Is that capturing it? These scaling, cropping and color changes seem like they would be important for callers (or users) to be aware of.

Exposing as a function

I think ideally this library would also expose any pre-processing for callers to use as well. That way tools using this can use the same pre-processing as well. Otherwise, if you made a tool that visualized the images the model predicts you might naively render the input image (which isn't actually what the TM model sees), or analyze how the TM model compares to other models (without using the same pre-processing step). Concretely, one suggestion there would be to expose something like:

model.preprocess(image: ClassifierInputSource)

Returns a Tensor representing the image, after applying any transformations that the model applies to an input (eg, scaling, cropping or normalizing). The specifics of the particular transformations are part of the internals of the library and subject to breaking changes, but this public method will be stable.

Args:

image: an image, canvas, or video element to make a classification on

Usage:

const img = new Image();
img.src = '...'; // some image that is larger than 224x224px and not square and has some transparency
img.onload = async () => {
  const tensor = await model.preprocess(img);
  const canvas = document.createElement('canvas');
  await tf.browser.toPixels(tensor, canvas);
  document.body.appendChild(canvas);
}
document.body.appendChild(img);

original

pre-processed

(note also the color in the background, which I'm assuming was introduced by translating from [0,255] => [-1,1] => [0, 255] but didn't look further)

example code

Thanks for sharing this awesome work 😄

irealva commented 5 years ago

Thanks for sharing Kevin! We'll look into this as soon a bit of work clears up. Some great thoughts and suggestions in here.

irealva commented 4 years ago

Hi Kevin, sorry for the delay on this. Would you want to take a stab at adding a preprocess function like this perhaps to utils and making a PR?

roca77 commented 3 years ago

This is definitely needed; I can train models but in production I am just shooting in the dark...This causes a lot of inaccurate prediction at inference because the input images would be reprocessed in a different fashion,...

Please advise

kevinrobinson commented 3 years ago

@roca77 I don't work on the project, and just ran into this when playing around with it a while ago. I don't have any other suggestions other than what I've written above, and I'm not sure that code is even the same as it was over a year ago :)

If you want help with something specific, you could try sharing a code example (eg, through CodePen or Glitch or a GitHub project) that illustrates and describes the problem you're seeing. I'm not sure you're running into exactly the same issue described here.

roca77 commented 3 years ago

@kevinrobinson Well, yes I am experiencing this issue. I build an image classifier but at inference I am not what sort of processing to apply to the input image. I know the images using in training are cropped to a square, but nothing about the dimension, scaling. I wish those information could be made public so I could implement them in my javascript code.

michaelrcrowl commented 1 year ago

@irealva , based on this thread, do I understand correctly that besides the cropping to square, Teachable Machine does not augment the data in any other way? No random rotations, flips, etc?

googlecreativelab / teachablemachine-community