[Bug]: 256*256 double矩阵乘法耗时不符合预期

liuhuan2719 commented 1 year ago

What happened

256x256 double矩阵乘法耗时不符合预期，相较于255x255和257x257差距过大

2.计算耗时没有线性关系，200x200矩阵 255x255的矩阵，计算量差距约2倍，实际耗时差6倍

Reproduction steps

代码如下，MATRIX_SIZE 修改矩阵维数，编译参数： riscv64-unknown-linux-gnu-gcc -march=rv64imafdcxthead -mabi=lp64d -mcmodel=medany -ffunction-sections -fdata-sections -funroll-loops -Wall -std=gnu11 -mtune=c908 -O2 matrix.c -o matrix

#include <stdio.h>
#include <stdlib.h>
#include <sys/time.h>
#include <stdint.h>

#define MATRIX_SIZE 256

double matrix1[MATRIX_SIZE][MATRIX_SIZE];
double matrix2[MATRIX_SIZE][MATRIX_SIZE];
double result[MATRIX_SIZE][MATRIX_SIZE];

double get_time_in_microseconds() {
    struct timeval tv;
    gettimeofday(&tv, NULL);
    return (double)tv.tv_sec * 1000000 + (double)tv.tv_usec;
}

static uint64_t mul_cnt = 0;

// 矩阵乘法函数
void matrix_multiply(double matrix1[][MATRIX_SIZE], double matrix2[][MATRIX_SIZE], double result[][MATRIX_SIZE]) {
    int i, j, k;
        double start_time;
        double end_time;
        double execution_time;
    for (i = 0; i < MATRIX_SIZE; i++) {
        for (j = 0; j < MATRIX_SIZE; j++) {
            result[i][j] = 0.0;
            for (k = 0; k < MATRIX_SIZE; k++) {
                result[i][j] = result[i][j] + (matrix1[i][k] * matrix2[k][j]);
                                mul_cnt++;
            }
        }
    }
}

int main() {

    // 生成两个随机的256x256的双精度矩阵
    int i, j;
    for (i = 0; i < MATRIX_SIZE; i++) {
        for (j = 0; j < MATRIX_SIZE; j++) {
            matrix1[i][j] = (double)rand() / RAND_MAX;
            matrix2[i][j] = (double)rand() / RAND_MAX;
        }
    }
#if 0
        printf("{");
    for (i = 0; i < MATRIX_SIZE; i++) {
        for (j = 0; j < MATRIX_SIZE; j++) {
            printf("%lf,", matrix1[i][j]);
        }
    }
        printf("}\n");
        printf("===============================\n");
        printf("{");
    for (i = 0; i < MATRIX_SIZE; i++) {
        for (j = 0; j < MATRIX_SIZE; j++) {
            printf("%lf,", matrix2[i][j]);
        }
    }
        printf("}\n");
#endif

    // 执行矩阵乘法
        double start_time = get_time_in_microseconds();
        for (int i = 0; i < 1; i++) {
                matrix_multiply(matrix1, matrix2, result);
        }
        double end_time = get_time_in_microseconds();

        double execution_time = end_time - start_time;

        printf("Execution Time: %.2f microseconds mul cnt %ld\n", execution_time, mul_cnt);
#if 0
    // 打印结果
    for (i = 0; i < MATRIX_SIZE; i++) {
        for (j = 0; j < MATRIX_SIZE; j++) {
            printf("%f ", result[i][j]);
        }
        printf("\n");
    }
#endif

    return 0;
}

Hardware board

k230 evb board

Software version

No response

Bug frequency

No response

Anything else

No response