ingonyama-zk / icicle

a GPU Library for Zero-Knowledge Acceleration
MIT License
303 stars 88 forks source link

fix: ntt mixed-radix bug regarding large ntts (and/or batch) #523

Closed yshekel closed 2 months ago

yshekel commented 2 months ago

in some cases 32b values would wrap around and cause invalid accesses to wrong elements and memory addresses

yshekel commented 2 months ago

looks fine, do these changes affect performance in any way?

Yes, for some cases (size+batch) it is a little slower. Sometimes slightly faster (which is odd but). It's probably since now the kernels use u64 which are more expensive instructions or more instructions or something.

I tried to minimize it. For example I initially made the meta.block_id u64 (Which make all ops with it u64) but then reverted and casted only where I need it (so only some ops are u64 and the struct is smaller etc.).

Do you have an idea how to support >4G elements but avoid it? I don't really think it's possible.

@HadarIngonyama