xmartlabs / Bender

Easily craft fast Neural Networks on iOS! Use TensorFlow models. Metal under the hood.
https://xmartlabs.github.io/Bender/
MIT License
1.8k stars 90 forks source link

Optimize Instance normalization to support any kind of image size #19

Closed mats-claassen closed 7 years ago

ajtulloch commented 7 years ago

BTW this introduces a bug - it's not safe to remove the threadgroup_barrier on line 57 of instanceNorm.metal, since then threads race between the read in const float4 mean = shared_mem[0]; (line 59) and the write in shared_mem[tid] = sum;, since tid 0 will write after reading. I wrote this bug originally so I was on the lookout :)

mats-claassen commented 7 years ago

You are right! Thanks for pointing that out.

ajtulloch commented 7 years ago

BTW if you're deriving from code in the Caffe2 MPSCNN backend, it'd be nice to add a citation/reference somewhere (and of course it'd be great to get improvements upstreamed as well 👍)

mats-claassen commented 7 years ago

That is correct, we should have done that in the first place. Our mistake. I have added a mention and fixed the issue.

My improvements are basically to allow a wider range of use cases. I do not know which would apply at Caffe2's code.

Thanks for your feedback. If you have more feedback it will be very welcome 😄