efficiency of scalar-field multiplication

herumi / mcl

a portable and fast pairing-based cryptography library

BSD 3-Clause "New" or "Revised" License

450 stars 151 forks source link

efficiency of scalar-field multiplication #199

Open quwenjie opened 2 months ago

quwenjie commented 2 months ago

Hi, Thanks a lot for your great library implementation. Recently I am running into a need of doing many scalar field multiplication, and in cryptography this should ideally be much faster than multiplication between fields. But here I simply profiled and found out that your library seems doesn't support scalar field multiplication that is faster than multiplication between fields(the profiling code gave very similar execution time for the two). I am wondering if there is any way I can implement this myself or you kindly add this feature to the library. Thanks!



  auto TP=clock();
  Fr S=0;
  for(int i=0;i<1e8;i++)
  {
    S+=Fr(i)*Fr(100);
  }
  cout<<S<<endl;
  auto TP2=clock();
  cout<< "field multi "<<(double) (TP2-TP)/CLOCKS_PER_SEC << endl;
  S=0;
  for(int i=0;i<1e8;i++)
  {
    S+=i*Fr(100);
  }

herumi commented 2 months ago

The cost of the constructor Fr(int) is almost the same as that of Fr::mul because it uses Montgomery conversion. i*Fr(100) is equal to Fr(i)*Fr(100).

If you need many small integer multiplication, then, for example, how about making a table of Fr and using it?

const int N = 256;
static Fr FrTbl[N];
for (int i = 0; i < N; i++) {
  FrTbl[i] = i;
}

Or you can use Fr::mulSmall(Fr& z, const Fr& x, uint32_t y) for y in [0, 9].

quwenjie commented 2 months ago

The cost of the constructor Fr(int) is almost the same as that of Fr::mul because it uses Montgomery conversion. i*Fr(100) is equal to Fr(i)*Fr(100).

If you need many small integer multiplication, then, for example, how about making a table of Fr and using it?
const int N = 256;
static Fr FrTbl[N];
for (int i = 0; i < N; i++) {
  FrTbl[i] = i;
}
Or you can use Fr::mulSmall(Fr& z, const Fr& x, uint32_t y) for y in [0, 9].

Thanks for your reply. Actually I want is a dot product between a vector of many field elements, and a vector of many small integers(0-255). I am wondering whether there is a better way to implement it instead of directly using Fr(int) then multiply with the field elements.

herumi commented 2 months ago

Fr::mul is one of the fastest implementations in the world. How much faster do you expect it to be if I implement the multiplication of Fr and an integer? If implemented, it would not be many times faster. Fr::mulSmall contains mulUnit 256bit x 64bit -> 320bit, table lookup (or 2-time mulUnit), and 2-time sub or some bit operations. The benchmark of Fr::mulSmall (Fr is a 256-bit integer) and Fp::mul is the following:

func	Fr::mul	Fr::mulSmall
clk	31	24

If you still want it, I will consider implementing it, but it is not a high priority.

quwenjie commented 2 months ago

Thanks, I found out that modifying the constant 9 to 255 in gmp_utils.hpp seems to work. Is this the correct way?

herumi commented 2 months ago

I'm sorry I missed your question. I think it's probably okay. I'll check back later when I get more time.

herumi commented 4 days ago

I updated mulSmall, which is an alias of mulUnit. mulUnit(F& z, const F& x, uint32_t y) is fast for 0 <= y <= 255.