google / aiyprojects-raspbian

API libraries, samples, and system images for AIY Projects (Voice Kit and Vision Kit)
https://aiyprojects.withgoogle.com/
Apache License 2.0
1.62k stars 694 forks source link

Downsampling #470

Open hazemati opened 6 years ago

hazemati commented 6 years ago

Hey, I'm trying to get the modivius chip to recognize objects in a 1920x080 image. My current understanding is that the modivius chip will downscale the image to 256x256 (or 160x160) because of the image size of the model. Is it possible to get the modivius chip to not downscale the images and instead run object recognition on the entire image?

dmitriykovalev commented 6 years ago

There is a downscale step which matches input tensor size, e.g. 160x160 for image classification. This directly depends on your compute graph and its inputs.

hazemati commented 6 years ago

Why do you guys downscale? Is there any way to turn this off? So, I'm feeding 1080p30 into the the chip, and I'm trying an object that's farther away. Is there a way to detect objects that are a little farther away (they will take up less space in the image)?. Lets assume it takes up 1/10th of the actual image.

bowu-google commented 6 years ago

hazemati, CNN classification models are trained with fix sized input images, i.e. they can not take images of arbitrary size. However, CNN detection models, if fully convolutional, can take images of various sizes. But due to memory constraint the input images must be tiled if too large. Could you please elaborate in your case, which particular model you trying to run?

hazemati commented 6 years ago

My initial tests were run with the MOBILENET, (I was doing image classification). Is it safe to assume that the current object detection models that were provided were not fully convoluted? Since the camera is directly connected to the modvius chipset, is there any way that tiling could be performed by the modivius chip?

hazemati commented 6 years ago

Is there any way to run a Yolo based object detection model on the google vision kit? also, how do the provided image classification/ object detection models deal with scale in variance?

bowu-google commented 6 years ago

Re comment #5: the current object detection models provided is a fully convolutional model. the constraint on input size is due to the on-chip memory size, not the model architecture. yes, it's possible to do tiling on modivius chip, but it's not implemented in the current version

Re comment #6: vision kit does not support Yolo model officially, but feel free to try the model compiler on it. if it passes model compilation, with good chance it could be run on vision kit. for how object detection model deal with scale invariance, please refer to the original SSD model paper or any objection detection model paper. usually the model has scale invariance up to some extent. it large scale range is desired, need to apply the model to image pyramid.

hazemati commented 6 years ago

So exactly what is the highest resolution that i can go to before I have to do tiling?