It was only tested on Linux platform. So, please check if it works on Windows. :) fix #186
The reason why it crashes
In the original code, the Mat out_alpha_tile is generated from in_alpha_tile, and in_alpha_tile is extracted from Mat in. However, the destination for alpha channel copy is Mat out, which is a tile Mat without padding. At the same time, Mat in is already padded, which resulted in copying data more than output Mat can hold, which is a out of memory boundaries problem and explains why the refcount of out mat was overwritten.
The tilesize of CPU mode used to be 4000, which is way larger than usual images, so that the code was working as intended. However, it was changed to 400 in a past commit, which turn a full image inference into a batch inference, and cause this bug.
It was only tested on Linux platform. So, please check if it works on Windows. :) fix #186
The reason why it crashes