What is the maximum number of points supported by the ‘blst_p1s_mult_pippenger’ function?

LuoGuiwen commented 2 years ago

I'm trying to compute multi-scalar multiplication using blst_p1s_mult_pippenger function. when I try npoints = (size_t) 1 << 18, it works smoothly. But npoints = (size_t) 1 << 19 and larger does not work. Is there any restriction to the maximum number of points supported?

Thanks.

dot-asm commented 2 years ago

No. You're welcome.

dot-asm commented 2 years ago

But on a more constructive note. Come on! It surely doesn't just say "it doesn't work" on the screen. If you want better answer, provide details :-) Language? How exactly does it fail?

LuoGuiwen commented 2 years ago

Thank you for your prompt reply and please excuse me for the fuzziness. Let me try to make it more clear.

I create a file called main_test.cpp as follow:

// Compile: g++ -std=c++17 -o main_test -g -O2 main_test.cpp libblst.a

#include "bindings/blst.h"

#include <cstdint>
#include <iostream>
#include <random>

// using namespace std;

/* Define ostreams */
std::ostream& operator<<(std::ostream& os, const blst_fp& b)
{
    os << std::hex << b.l[0] << " "<< b.l[1]<<  " "<< b.l[2]<< " "<< b.l[3] <<" "<<  b.l[4] <<" "<<  b.l[5] <<" ";
    return os;
}

/* Define global variables */

const size_t N_POINTS = (size_t) (1 << 16);  // I cannot test 2**19 and 2**20, it will cause segmentation fault. 
const blst_p1 G1_BLST_DEFAULT_GENERATOR = *blst_p1_generator(); // Default generator in blst_p1/

blst_fr FR_ONE;
blst_fp FP_ONE;
blst_p1 G1_GENERATOR; 
blst_p1_affine G1_GENERATOR_AFFINE; 

void init(){

    // initialize the identity, i.e., one, in fr.
    uint64_t blst_fr_one_vec[] = {uint64_t(1),uint64_t(0),uint64_t(0),uint64_t(0)}; 
    blst_fr_from_uint64(&FR_ONE, blst_fr_one_vec); 

    // initialize the identity, i.e., one, in fp.
    uint64_t fp_one_vec[] = {uint64_t(0x1), uint64_t(0x0), uint64_t(0x0), uint64_t(0x0), uint64_t(0x0), uint64_t(0x0)};
    blst_fp_from_uint64(&FP_ONE, fp_one_vec);   

    /* 
    initialize the generator in blst_p1 by Guiwen. G1_GENERATOR = 11 * 10177 * 859267 * 52437899* (Point whose x-coordinate is 4).

    G1_GENERATOR =  
    {0x7f127a0a6f06434698a6b6598fc6d8bd7e8482362c69b416d8640c18c1caec0ab474874acad9e91be475966f7413a26, 
    0x5be03c7afc54a0b30376055f27a4ff60e8ca9060651b98fa6caa6937bed9116b52ad54fbc4e22cd69b8519cb9bfd662}
    */   
    uint64_t G1x_vec[] = {uint64_t(0xbe475966f7413a26), uint64_t(0xab474874acad9e91), uint64_t(0x6d8640c18c1caec0), uint64_t(0xd7e8482362c69b41), uint64_t(0x698a6b6598fc6d8b), uint64_t(0x07f127a0a6f06434)} ;
    uint64_t G1y_vec[] = {uint64_t(0x69b8519cb9bfd662), uint64_t(0xb52ad54fbc4e22cd), uint64_t(0xa6caa6937bed9116), uint64_t(0x0e8ca9060651b98f), uint64_t(0x30376055f27a4ff6), uint64_t(0x05be03c7afc54a0b)} ;
    blst_fp G1x, G1y;
    blst_fp_from_uint64(&G1x, G1x_vec);
    blst_fp_from_uint64(&G1y, G1y_vec);   
    G1_GENERATOR = {G1x, G1y, FP_ONE};  // integer a -> a << 384 mod p in montgomery form
    G1_GENERATOR_AFFINE = {G1x, G1y};
    std::cout << "Check G1_generator is in G1: " << blst_p1_in_g1(&G1_GENERATOR) <<std::endl;
}

/*  */

blst_scalar random_blst_scalar(){

    // credit to url: https://stackoverflow.com/questions/19665818/generate-random-numbers-using-c11-random-library
    std::random_device rd;
    std::mt19937_64 mt(rd());
    std::uniform_real_distribution<uint64_t> dist(1, (uint64_t) 0xffffffffffffffff);
    uint64_t scalar_vec[4];
    scalar_vec[0] = dist(mt);
    scalar_vec[1] = dist(mt);
    scalar_vec[2] = dist(mt);
    scalar_vec[3] = dist(mt);
    blst_scalar scalar;
    blst_scalar_from_uint64( &scalar, scalar_vec);

    return scalar;
}

void test(){

    size_t nbits = 256;

    // std::cout << "scratch size: "<< blst_p1s_mult_pippenger_scratch_sizeof((1<< 19))/sizeof(limb_t) << std::endl;
    limb_t scratch[blst_p1s_mult_pippenger_scratch_sizeof(N_POINTS)/sizeof(limb_t)];
    uint8_t* scalars[N_POINTS];

    for(size_t i = 0; i < N_POINTS; ++i){
            scalars[i] = random_blst_scalar().b;
    }

    blst_p1_affine* points[N_POINTS];

    blst_p1 tmp_P, tmp2_P = G1_GENERATOR;
    blst_p1_affine tmp_P_affine;

    for(size_t i = 0; i < N_POINTS; ++i){
            blst_p1_double(&tmp_P, &tmp2_P);
            tmp2_P = tmp_P;
            blst_p1_to_affine(&tmp_P_affine, &tmp_P);
            points[i] = &tmp_P_affine;
        }

    blst_p1 ret_P; // Mont coordinates
    blst_p1_affine ret_P_affine;

    /*blst pippenger*/

    std::cout<< "blst pippenger test: " << std::endl;

    size_t TEST_NUM = 1;
    auto st = std::chrono::steady_clock::now();
    for(size_t i = 0; i< TEST_NUM; ++i)
    blst_p1s_mult_pippenger(&ret_P, points, N_POINTS, scalars, nbits, scratch);
    auto ed = std::chrono::steady_clock::now();   
    std::chrono::microseconds diff = std::chrono::duration_cast<std::chrono::microseconds>(ed -st); 
    std::cout << "blst pippenger Wall clock time elapse is: " << std::dec << diff.count()/TEST_NUM << " us "<< std::endl;
    blst_p1_to_affine(&ret_P_affine, &ret_P);
    std::cout << ret_P_affine.x <<std::endl;
    std::cout << ret_P_affine.y <<std::endl;

    std::cout << "TEST END" <<std::endl;

}

int main(){

    init();
    test();

    return 0;
}

on the 2018 MacBook Pro with Mac OS system, 16 GB RAM,2.2 GHz 6-Core Intel Core i7, I compile it using g++ -std=c++17 -o main_test -g -O2 main_test.cpp libblst.a.

When const size_t N_POINTS = (size_t) (1 << 16), ./main_test runs successfully. When const size_t N_POINTS = (size_t) (1 << 19), ./main_test says segmentation fault.

My guess is that the fault is related to scratch variable, which I didn't understand its functionality. Maybe my laptop RAM is too small?

Can you please explain me in fucntion blst_p1s_mult_pippenger(&ret_P, points, N_POINTS, scalars, nbits, scratch), what the nbits and scratch are? and why my code results segmentation fault?

Thanks a lot, Guiwen

dot-asm commented 2 years ago

C++, no scratch, segmentation fault.

If you don't allocate the scratch area, it will use the stack as scratch. The stack is limited in size, customarily 8MB, and attempt to use more than that is bound to trigger segmentation violation. So no mystery that ~it~ your application fails for larger amount of points. Just allocate the scratch.

nbits is the number of bits in scalars. Point is that there is application for shorter scalars, and providing a way to specify the scalars' width allows applications to minimize the memory footprint. If you work with full-width fully-reduced scalar values, pass the default 255.

LuoGuiwen commented 2 years ago

I did allocate the scratch area, as you would see the 3rd line of test() function in my code, limb_t scratch[blst_p1s_mult_pippenger_scratch_sizeof(N_POINTS)/sizeof(limb_t)];.

The only difference is I assigned nbits = 256. I just tried nbits = 255, It showed segmentation fault. I also tried to allocate the scratch area using new as you suggested, it still showed segmentation fault.

By the way, when I compiled the code, I ignored the warning, which says ld: warning: could not create compact unwind for _blst_sha256_block_data_order: does not use RBP or RSP based frame. Is this a problem?

Thanks.

dot-asm commented 2 years ago

You allocate the scratch area on the stack and it's essentially the same as not allocating one. Sorry about the confusion. The scratch area should be allocated from the pool, where there is no implied limit. I mean at least not as low as 8MB.

No, the warning is not a problem.

LuoGuiwen commented 2 years ago

Thank you. I kind of understand it.

Could you please let me know how I can allocate memory from the main pool rather than stack? I tried to define scratch as a global variable and use operator new to allocate memory for it, but this method doesn't work.

dot-asm commented 2 years ago

I'm sorry, but no. This is a generic textbook question, and this forum is not a classroom. Besides, a reference to a working code was provided.

LuoGuiwen commented 2 years ago

Thank you so much dot-asm. I finally managed to run the code by unleashing the stack restriction with ulimit -s unlimited in terminal.

mratsim commented 2 years ago

Thank you so much dot-asm. I finally managed to run the code by unleashing the stack restriction with ulimit -s unlimited in terminal.

Respectfully, you need to review variable declaration stack and heap allocation in C and C++.

You are going to trigger significant issues with ulimit, first of all, the fact that your application will not work on Windows or when you do not have the admin rights to change those critical safety features. And second, the fact that people can now DOS your machine with recursive function calls.

I did allocate the scratch area, as you would see the 3rd line of test() function in my code

No, you are declaring the array on the stack, review carefully the working code given.

I tried to define scratch as a global variable and use operator new to allocate memory for it, but this method doesn't work.

Either you use new or malloc/free. Please ask your advisor for material on this basic C/C++ programming question.

supranational / blst

What is the maximum number of points supported by the ‘blst_p1s_mult_pippenger’ function? #126