Open hazemati opened 6 years ago
There is a downscale step which matches input tensor size, e.g. 160x160 for image classification. This directly depends on your compute graph and its inputs.
Why do you guys downscale? Is there any way to turn this off? So, I'm feeding 1080p30 into the the chip, and I'm trying an object that's farther away. Is there a way to detect objects that are a little farther away (they will take up less space in the image)?. Lets assume it takes up 1/10th of the actual image.
hazemati, CNN classification models are trained with fix sized input images, i.e. they can not take images of arbitrary size. However, CNN detection models, if fully convolutional, can take images of various sizes. But due to memory constraint the input images must be tiled if too large. Could you please elaborate in your case, which particular model you trying to run?
My initial tests were run with the MOBILENET, (I was doing image classification). Is it safe to assume that the current object detection models that were provided were not fully convoluted? Since the camera is directly connected to the modvius chipset, is there any way that tiling could be performed by the modivius chip?
Is there any way to run a Yolo based object detection model on the google vision kit? also, how do the provided image classification/ object detection models deal with scale in variance?
Re comment #5: the current object detection models provided is a fully convolutional model. the constraint on input size is due to the on-chip memory size, not the model architecture. yes, it's possible to do tiling on modivius chip, but it's not implemented in the current version
Re comment #6: vision kit does not support Yolo model officially, but feel free to try the model compiler on it. if it passes model compilation, with good chance it could be run on vision kit. for how object detection model deal with scale invariance, please refer to the original SSD model paper or any objection detection model paper. usually the model has scale invariance up to some extent. it large scale range is desired, need to apply the model to image pyramid.
So exactly what is the highest resolution that i can go to before I have to do tiling?
Hey, I'm trying to get the modivius chip to recognize objects in a 1920x080 image. My current understanding is that the modivius chip will downscale the image to 256x256 (or 160x160) because of the image size of the model. Is it possible to get the modivius chip to not downscale the images and instead run object recognition on the entire image?