supranational / sppark

Zero-knowledge template library
Apache License 2.0
186 stars 66 forks source link

data copy time fluctuating while concurrent NTT invokation #37

Closed cliff0412 closed 9 months ago

cliff0412 commented 10 months ago

i am testing for gl64 NTT with log_n_size=17, under concurent environment. i observed that the data copy from host to device ranges from 20us to 6ms. i think the underlying code does not utilise aync. the last line gpu.sync() will block CPU.

  gpu.select();
  size_t domain_size = (size_t)1 << lg_domain_size;
  dev_ptr_t<fr_t> d_inout{domain_size, gpu};
   gpu.HtoD(&d_inout[0], inout, domain_size);
    NTT_internal(&d_inout[0], lg_domain_size, order, direction, type, gpu,
                         coset_ext_pow);
   gpu.DtoH(inout, &d_inout[0], domain_size);
    gpu.sync();

or, it would be better to provide a batch NTT function

dot-asm commented 9 months ago

Here is the problem. For example 20us is obviously impossible, which indicates faulty methodology and misunderstanding of some basics. And, again, we don't have resources to correct that...