intel / isa-l_crypto

Other
267 stars 80 forks source link

sha1_mb/aarch64/sha1_mb_mgr_ce: rand_update_test enters an infinite loop when SHA1_MB_CE_MAX_LANES >= 3 #129

Closed chenxuqiang closed 5 months ago

chenxuqiang commented 10 months ago

I found that SHA1 MultiBuffer is blocking in sha1_mb_rand_update_test when the SHA1_MB_CE_MAX_LANES is greater than 2. According to the len_done and len_rem information, when len_rem == 256 , the expected value of LAST flag is submitted, and the calculation of this flow ends. However, the test result shows that lane1's len_rem becomes a negative number and continues to execute the calculation. As a result, an infinite loop occurs.

2: len_done = 1048320
2: len_rem = 256
1: len_done = 1048320
1: len_rem = 256
0: len_done = 1048320
0: len_rem = 256
1: len_done = 1049152
1: len_rem = -576
1: len_done = 1049984
1: len_rem = -1408
1: len_done = 1050816
1: len_rem = -2240
1: len_done = 1051648
1: len_rem = -3072
1: len_done = 1052480
1: len_rem = -3904
...

After further log analysis, we find that when lanes==3, the update sequence is as follows:

-> 0 -> 1 -> 2
-> 2 -> 1 -> 0
...
-> 2  -> 1 -> 0 // last round
-> 1  -> 1 -> 1 // Error occured.
.....

Therefore, the last commit is lanes == 0 (i== 0). In this case, i++ -> i == 1, but the last request with lanes=1 has been committed. As a result, lem_rem becomes a negative number and an infinite loop occurs.

The expected order of submission should be as follows:

-> 0 -> 1 -> 2
-> 0 -> 1 -> 2
...
-> 0  -> 1 -> 2 // last round

The last submitted request should be i==2. In this case, i++ == 3 can exit the loop (when TEST_BUFS==3) or commit new lane's request (when TEST_BUFS>3).

So what's the problem? I found the problem in the implementation of sha1_mb_mgr_init_ce in sha1_mb_mgr_ce.c:

void sha1_mb_mgr_init_ce(SHA1_MB_JOB_MGR * state)
{
    unsigned int i;

    state->unused_lanes = 0xf;
    state->num_lanes_inuse = 0;
        // look at me: Lane_idx array after initialization: [0, 1, 2]
    for (i = 0; i < SHA1_MB_CE_MAX_LANES; i++) {
        state->unused_lanes <<= 4;
        state->unused_lanes |= i;
        state->lens[i] = i;
        state->ldata[i].job_in_lane = 0;
    }

    //lanes > SHA1_MB_CE_MAX_LANES is invalid lane
    for (; i < SHA1_MAX_LANES; i++) {
        state->lens[i] = 0xf;
        state->ldata[i].job_in_lane = 0;
    }
}

Lane_idx array after initialization: [0, 1, 2], When the sha1_mb_mgr_insert_job operation is called, lane_idx is allocated from right to left. The obtained lane_idx is 2, 1, and 0. Therefore, the first returned request is the value of the first lane_idx position, which is 2. Therefore, the request of buffer0 is inserted into lane_idx=2. Whensha1_mb_mgr_free_lane is called, lane_idx: 0~2 is traversed. Therefore, the result of the last commit, buffer2, is returned instead of the expected buffer0. So there's:

submit: 0        ->       1   -> 2 
return:  NULL -> NULL  -> 2
submit: 2        ->       1  -> 0
return:  1        ->       0  -> 2
...

The correct method is to initialize unsed_lanes to [2, 1, 0], just like the other modules' MGR , such as sha256_mb_mgr_init_ce or sha1_mb_mgr_init_asimd.

void sha1_mb_mgr_init_ce(SHA1_MB_JOB_MGR * state)
{
    int i;

    state->unused_lanes = 0xf;
    state->num_lanes_inuse = 0;
    for (i = SHA1_MB_CE_MAX_LANES - 1; i >= 0; i--) {
        state->unused_lanes <<= 4;
        state->unused_lanes |= i;
        state->lens[i] = i;
        state->ldata[i].job_in_lane = 0;
    }

    //lanes > SHA1_MB_CE_MAX_LANES is invalid lane
    for (i = SHA1_MB_CE_MAX_LANES; i < SHA1_MAX_LANES; i++) {
        state->lens[i] = 0xf;
        state->ldata[i].job_in_lane = 0;
    }
}
chenxuqiang commented 10 months ago

Why is there no error in the original SHA1_MB_CE_MAX_LANES==2 ? After the test, we find that the same problem occurs, but the infinite loop is not triggered:

...
0 update
0, 1047488, 1088

1 update
1, 1048320, 256    # buffer1 update with LAST

0 update
0, 1048320, 256    # buffer0 update with LAST

i = 0 completed. 
1, 1048320, 256    # buffer1 is submitted twice

i = 1 completed.   # It's just that because buffer1 is also completed, there's no infinite loop.
...
pablodelara commented 5 months ago

This is now fixed, thanks!