nvdla / sw

NVDLA SW
Other
489 stars 193 forks source link

Pooling doesn't work for large images #232

Closed timzhang32 closed 2 years ago

timzhang32 commented 2 years ago

Hello,

when executing YOLO on NVDLA, I realized that currently it is hard-corded somewhere either in the HW implementation or KMD. When I pool an 448x448 image, the result is completely wrong. But when I reduce the image size to half (224x224), the pooling engine works fine. From what it looks like, for 448x448 images, there is basically a duplication of the original image in the pooling result and it starts from a little bit right than the center of the image.

Pooling result:

pool1_nvdla

Original image:

000000000139_224x224

I'm assuming the 256th pixel, because I have a feeling somewhere in the code uint8_t is used and it is not large enough for reading 448x448 images. Has anyone encountered the same problem? Or does anyone know there is indeed a hard-coded section? I am very curious about this.

Best Tim

long771 commented 2 years ago

这是来自QQ邮箱的自动回复邮件。   您好,我是龙欣荣,很高兴收到您的邮件,我将尽快处理。

timzhang32 commented 2 years ago

Thanks, by the way, the problem only applies for pooling, convolution on 448x448 images is working.

cainiaowu commented 2 years ago

When pooling output width >128, you need to use the split mode. Max pooling output width depends on stride_x and kernel_x due to PDP internal buffer limit.

timzhang32 commented 2 years ago

When pooling output width >128, you need to use the split mode. Max pooling output width depends on stride_x and kernel_x due to PDP internal buffer limit.

Hello,

that sounds good, because it seems like a software problem. I'm assuming I should split the width while doing pooling? What change do I need to make to enable the split mode? Do I have to specify some macros in the KMD?

Also, how do you get the number 128 & how do I decide when to use split mode? In the Unit Description I can only see some explanation about whether to use flying mode, but nothing specific about when to use the split mode. By the way, I'm doing standard 2x2 pooling with a stride of 2.

Thank you in advance.

Best Tim

timzhang32 commented 2 years ago

When pooling output width >128, you need to use the split mode. Max pooling output width depends on stride_x and kernel_x due to PDP internal buffer limit.

By the way, does the split mode only work for nv_large? I'm only able to use the nv_small configuration.

cainiaowu commented 2 years ago

128深度(16atom * 8instance = 128)是看PDP RTL代码分析出来的。具体每个配置应该都一样,PDP的缓存深度是固定的。

timzhang32 commented 2 years ago

128深度(16atom * 8instance = 128)是看PDP RTL代码分析出来的。具体每个配置应该都一样,PDP的缓存深度是固定的。

你是怎么实现split mode的?我试了在kmd的pdp.c里面定义split num和partial width,但是现在程序在中间就卡住了。

timzhang32 commented 2 years ago

128深度(16atom * 8instance = 128)是看PDP RTL代码分析出来的。具体每个配置应该都一样,PDP的缓存深度是固定的。

已解决。