fix large (>512 elements) ecntt issue

ingonyama-zk / icicle

a GPU Library for Zero-Knowledge Acceleration

MIT License

303 stars 88 forks source link

fix large (>512 elements) ecntt issue #553

Closed yshekel closed 3 weeks ago

yshekel commented 3 weeks ago

This PR solves an issue for large ecntt where cuda blocks are too large and cannot be assigned to SMs. The fix is to reduce thread count per block and increase block count in that case.

yshekel commented 3 weeks ago

looks fine but why not use 128 for all cases? I'm not aware of any benefit of using large blocks.

I also was wondering that, but I assumed maybe it performed better so that's why it as done like that. Also I would have to measure and see that I did not degrade anything so wanted to avoid that right now.