Closed thixotropist closed 8 months ago
Let's refine the goals a bit:
memcpy
or strcpy
where vulnerabilities may be lurking?Preliminary results:
whisper.cpp
is a C++ application, but most of the work in done within ggml C functions. We don't get a clean comparison of vector and C++ vtable complexities here.whisper.cpp
contains some explicit riscv-64 intrinsics in its source code. By far the majority of the vector instructions are compiler (gcc-14) generated due to loop autovectorization. Ghidra users likely would have to learn to recognize the basic vector load/store, type conversion, and reduction instruction patterns. That's not as hard as it sounds.
Suppose we had to inspect a Machine Learning or LLM app for malware. Would the presence of ISA extensions like vector instructions make the inspection materially harder? We'll use the whisper.cpp voice-to-text application as the first exemplar, compiling it for various riscv64 and x86_64 platforms:
- march
profilesx86-64-v2
,x86-64-v3
, andx86-64-v4
In each case we will use the recommended
whisper.cpp
compilation options-O3
and-ffast-math
. As a stretch goal, we might run all versions through BSIM analysis, hoping that the platform variations do not hugely impact BSIM similarity vectors.