bm_status_t bm_memcpy_d2s_partial_offset(bm_handle_t handle, void *dst, bm_device_mem_t src, unsigned int size, unsigned int offset)
To copy specified bytes of data from device memory to system memory with an offset in device memory address.
参数:
[in] – handle The device handle
[in] – dst The destination memory (system memory, a void* pointer)
[in] – src The source memory (device memory descriptor)
[in] – size The size of data to copy (in bytes)
[in] – offset The offset of the device memory address
参考其描述,size应该按数据类型对应的bytes进行倍率
int32(int)
fp16
bf16
fp32
size倍率
4
2
2
4
修改建议
void dump_bf16_tensor(bm_handle_t bm_handle, bm_device_mem_t mem, int offset,
int size) {
std::vector<uint16_t> data(size);
bm_memcpy_d2s_partial_offset(bm_handle, data.data(), mem, size * 2, offset);
std::cout << "-------------------------------------" << std::endl;
fp32 t;
for (int i = 0; i < size; i++) {
t.bits = bf16_to_fp32_bits(data[i]);
std::cout << t.fval << std::endl;
}
std::cout << "-------------------------------------" << std::endl;
}
void dump_fp16_tensor(bm_handle_t bm_handle, bm_device_mem_t mem, int offset,
int size) {
std::vector<uint16_t> data(size);
bm_memcpy_d2s_partial_offset(bm_handle, data.data(), mem, size * 2, offset);
std::cout << "-------------------------------------" << std::endl;
fp32 t;
for (int i = 0; i < size; i++) {
t.bits = fp16_ieee_to_fp32_bits(data[i]);
std::cout << t.fval << std::endl;
}
std::cout << "-------------------------------------" << std::endl;
}
void dump_fp32_tensor(bm_handle_t bm_handle, bm_device_mem_t mem, int offset,
int size) {
std::vector<float> data(size);
std::cout << "dump size " << data.size() << std::endl;
bm_memcpy_d2s_partial_offset(bm_handle, data.data(), mem, size * 4, offset);
std::cout << "-------------------------------------" << std::endl;
for (int i = 0; i < size; i++) {
std::cout << data[i] << std::endl;
}
std::cout << "-------------------------------------" << std::endl;
auto ptr = data.data();
ptr[0] = ptr[0];
}
void dump_int_tensor(bm_handle_t bm_handle, bm_device_mem_t mem, int offset,
int size) {
std::vector<int> data(size);
bm_memcpy_d2s_partial_offset(bm_handle, data.data(), mem, size * 4, offset);
std::cout << "-------------------------------------" << std::endl;
for (int i = 0; i < size; i++) {
std::cout << data[i] << std::endl;
}
std::cout << "-------------------------------------" << std::endl;
auto ptr = data.data();
ptr[0] = ptr[0];
}
文件
untils.h
BUG描述
API调用错误(猜测写得急写错了)
涉及到的函数:
dump_bf16_tensor
,dump_fp16_tensor
,dump_fp32_tensor
,dump_int_tensor
这几个函数调用了device2soc的拷贝函数bm_memcpy_d2s_partial_offset
,但size没有倍率,导致卸载打印tensor时,只能看到1/4或1/2的tensor值,其他值都为0bm_memcpy_d2s_partial_offset
的doc:参考其描述,size应该按数据类型对应的bytes进行倍率
修改建议