sophgo / LLM-TPU

Run generative AI models in sophgo BM1684X
Other
119 stars 19 forks source link

untils.h存在错误,卸载tensor进行查看时,只能看到1/4或1/2的tensor值,其他值都为0 #9

Closed szxysdt closed 6 months ago

szxysdt commented 6 months ago

文件

untils.h

BUG描述

API调用错误(猜测写得急写错了)

涉及到的函数:

dump_bf16_tensor,dump_fp16_tensor,dump_fp32_tensor,dump_int_tensor 这几个函数调用了device2soc的拷贝函数bm_memcpy_d2s_partial_offset,但size没有倍率,导致卸载打印tensor时,只能看到1/4或1/2的tensor值,其他值都为0 bm_memcpy_d2s_partial_offset的doc:

bm_status_t bm_memcpy_d2s_partial_offset(bm_handle_t handle, void *dst, bm_device_mem_t src, unsigned int size, unsigned int offset)
To copy specified bytes of data from device memory to system memory with an offset in device memory address.
参数:
[in] – handle The device handle
[in] – dst The destination memory (system memory, a void* pointer)
[in] – src The source memory (device memory descriptor)
[in] – size The size of data to copy (in bytes)
[in] – offset The offset of the device memory address

参考其描述,size应该按数据类型对应的bytes进行倍率

int32(int) fp16 bf16 fp32
size倍率 4 2 2 4

修改建议

void dump_bf16_tensor(bm_handle_t bm_handle, bm_device_mem_t mem, int offset,
                      int size) {
  std::vector<uint16_t> data(size);
  bm_memcpy_d2s_partial_offset(bm_handle, data.data(), mem, size * 2, offset);
  std::cout << "-------------------------------------" << std::endl;
  fp32 t;
  for (int i = 0; i < size; i++) {
    t.bits = bf16_to_fp32_bits(data[i]);
    std::cout << t.fval << std::endl;
  }
  std::cout << "-------------------------------------" << std::endl;
}

void dump_fp16_tensor(bm_handle_t bm_handle, bm_device_mem_t mem, int offset,
                      int size) {
  std::vector<uint16_t> data(size);
  bm_memcpy_d2s_partial_offset(bm_handle, data.data(), mem, size * 2, offset);
  std::cout << "-------------------------------------" << std::endl;
  fp32 t;
  for (int i = 0; i < size; i++) {
    t.bits = fp16_ieee_to_fp32_bits(data[i]);
    std::cout << t.fval << std::endl;
  }
  std::cout << "-------------------------------------" << std::endl;
}

void dump_fp32_tensor(bm_handle_t bm_handle, bm_device_mem_t mem, int offset,
                      int size) {
  std::vector<float> data(size);
  std::cout << "dump size " << data.size() << std::endl;
  bm_memcpy_d2s_partial_offset(bm_handle, data.data(), mem, size * 4, offset);
  std::cout << "-------------------------------------" << std::endl;
  for (int i = 0; i < size; i++) {
    std::cout << data[i] << std::endl;
  }
  std::cout << "-------------------------------------" << std::endl;
  auto ptr = data.data();
  ptr[0] = ptr[0];
}

void dump_int_tensor(bm_handle_t bm_handle, bm_device_mem_t mem, int offset,
                     int size) {
  std::vector<int> data(size);
  bm_memcpy_d2s_partial_offset(bm_handle, data.data(), mem, size * 4, offset);
  std::cout << "-------------------------------------" << std::endl;
  for (int i = 0; i < size; i++) {
    std::cout << data[i] << std::endl;
  }
  std::cout << "-------------------------------------" << std::endl;
  auto ptr = data.data();
  ptr[0] = ptr[0];
}
HarmonyHu commented 6 months ago

非常感谢,这个问题已经修了,改成了bm_memcpy_d2s