Closed mats-claassen closed 7 years ago
You are right! Thanks for pointing that out.
BTW if you're deriving from code in the Caffe2 MPSCNN backend, it'd be nice to add a citation/reference somewhere (and of course it'd be great to get improvements upstreamed as well 👍)
That is correct, we should have done that in the first place. Our mistake. I have added a mention and fixed the issue.
My improvements are basically to allow a wider range of use cases. I do not know which would apply at Caffe2's code.
Thanks for your feedback. If you have more feedback it will be very welcome 😄
BTW this introduces a bug - it's not safe to remove the
threadgroup_barrier
on line 57 ofinstanceNorm.metal
, since then threads race between the read inconst float4 mean = shared_mem[0];
(line 59) and the write inshared_mem[tid] = sum;
, since tid 0 will write after reading. I wrote this bug originally so I was on the lookout :)