Logically, there should be another sync necessary at the end of NTT1024Core()/NTTInv1024Core(), because the shared memory is read and written with a different pattern in the end of the function and at the beginning of the parent one, but tests seem to show that the error disappears even without it.
Fixes #3 (hopefully).
Logically, there should be another sync necessary at the end of
NTT1024Core()
/NTTInv1024Core()
, because the shared memory is read and written with a different pattern in the end of the function and at the beginning of the parent one, but tests seem to show that the error disappears even without it.