Open BlueAmulet opened 3 months ago
Do you have a fix for it?
I just subtracted the size by 31, which isn't really a fix but more of a workaround.
@BlueAmulet How about this it works for me let me know if it works for you too.
ScanResult FindAvx2(const Pattern& patternData, void* startAddr, size_t size) {
constexpr size_t UNIT_SIZE = 32;
size_t processedSize = 0;
__m256i pattern = _mm256_load_si256((__m256i*);
__m256i mask = _mm256_load_si256((__m256i*);
__m256i allZeros = _mm256_set1_epi8(0x00);
size_t chunk = 0;
for (; chunk + UNIT_SIZE <= size; chunk += UNIT_SIZE) {
__m256i chunkData = _mm256_loadu_si256((__m256i*)((char*)startAddr + chunk));
__m256i blend = _mm256_blendv_epi8(allZeros, chunkData, mask);
__m256i eq = _mm256_cmpeq_epi8(pattern, blend);
if (_mm256_movemask_epi8(eq) == 0xffffffff) {
processedSize += UNIT_SIZE;
if (processedSize < patternData.unpaddedSize) {
pattern = _mm256_load_si256((__m256i*)( + processedSize));
mask = _mm256_load_si256((__m256i*)( + processedSize));
} else {
char* matchAddr = (char*)startAddr + chunk - processedSize + UNIT_SIZE;
return ScanResult((void*)matchAddr);
} else {
pattern = _mm256_load_si256((__m256i*);
mask = _mm256_load_si256((__m256i*);
processedSize = 0;
if (chunk < size) {
size_t remainingBytes = size - chunk;
__m256i chunkData = _mm256_loadu_si256((__m256i*)((char*)startAddr + chunk));
__m256i remainingMask = _mm256_set1_epi8(0x00);
for (size_t i = 0; i < remainingBytes; ++i) {
((char*)&remainingMask)[i] = 0xFF;
__m256i blend = _mm256_blendv_epi8(allZeros, chunkData, remainingMask);
__m256i eq = _mm256_cmpeq_epi8(pattern, blend);
if (_mm256_movemask_epi8(eq) == 0xffffffff) {
char* matchAddr = (char*)startAddr + chunk;
return ScanResult((void*)matchAddr);
return ScanResult(nullptr);
Fix for this as well as performance is planned, but I am a bit busy lately, will try to get it out soon
@localcc would be great, My solution did not work.
The AVX2 scanner reads 32bytes at once, so as
approaches the end ofsize
, it ends up reading past the end of the buffer SSE4.2 scanner also has the same issue.
This can cause crashes if there is no readable memory past the end of the buffer.