Closed pasabanov closed 4 months ago
Actually,
if (needle_len > hs_real_len)
{
return NULL;
}
can be removed too because, in that case, the loop won't perform any iterations and the function will return NULL anyways.
But I'm nut sure if this will increase or decrease performance in your case.
1. Removed `len == 0` check from the first `if` statement. Now the function doesn't immediately return `NULL` if `len` is zero. I believe that the previous behavior was a bug, because if `needle_len` is zero, the function should return `haystack`, because the empty substring does reside at the start of the empty string.
Could you add a unit test about that case in strnstrUnitTest()
, please?
@pasabanov, could you rebase?
@IvanNardi, I've added unit test cases for zero length in strnstrUnitTest()
and rebased.
@IvanNardi, what do you think about removing the checks if (needle_len == 1)
, which I've already removed, and if (needle_len > hs_real_len)
, which I suggest removing?
They're part of the general case, so if you don't search for one-character needles or needles that are longer than haystacks often, then there might be no point in optimizing these cases.
@IvanNardi, what do you think about removing the checks
if (needle_len == 1)
, which I've already removed, andif (needle_len > hs_real_len)
, which I suggest removing? They're part of the general case, so if you don't search for one-character needles or needles that are longer than haystacks often, then there might be no point in optimizing these cases.
Actually there are some cases where ndpi_strnstr
is used to search for a single character, that's why I added this optimization. Well, thanks anyway, great work!
mgcp.c
endpoint = ndpi_strnstr((char const *)packet->payload + 5, " ", packet->payload_packet_len - 5);
if (endpoint == NULL)
{
break;
}
endpoint++;
mgcp = ndpi_strnstr(endpoint, " ", packet->payload_packet_len - ((u_int8_t const *)endpoint - packet->payload));
if (mgcp == NULL)
{
break;
}
mgcp++;
ethereum.c
} else if(ndpi_strnstr((const char *)packet->payload, "{", packet->payload_packet_len)
xiaomi.c
ptr = ndpi_strnstr((const char *)&payload[offset], ":", len);
tivoconnect.c
for (newline = ndpi_strnstr(payload, "\n", payload_len);
newline != NULL;
key = ++newline,
newline = ndpi_strnstr(newline, "\n", payload_len - (newline - payload)))
{
Actually there are some cases where
ndpi_strnstr
is used to search for a single character, that's why I added this optimization. Well, thanks anyway, great work!
Thanks!
In that case, I think I should add that check back.
What do you think about the second check if (needle_len > hs_real_len)
? Is it needed too?
It saves a fairly small number of instructions when the needle is longer than the haystack.
Actually there are some cases where
ndpi_strnstr
is used to search for a single character, that's why I added this optimization. Well, thanks anyway, great work!Thanks!
In that case, I think I should add that check back.
What do you think about the second check
if (needle_len > hs_real_len)
? Is it needed too? It saves a fairly small number of instructions when the needle is longer than the haystack.
I dunno, I guess you could keep it. Usually the compiler manages skipping redundant checks.
I dunno, I guess you could keep it. Usually the compiler manages skipping redundant checks.
Ok. I've returned the removed check if (needle_len == 1)
, but I switched its order with the if (needle_len > hs_real_len)
check because I think that would be more efficient.
By the way, the compiler doesn't remove the check if (needle_len > hs_real_len)
at any level of optimization. I checked that with latest versions of gcc and clang on godbolt.org.
I dunno, I guess you could keep it. Usually the compiler manages skipping redundant checks.
Ok. I've returned the removed check
if (needle_len == 1)
, but I switched its order with theif (needle_len > hs_real_len)
check because I think that would be more efficient.By the way, the compiler doesn't remove the check
if (needle_len > hs_real_len)
at any level of optimization. I checked that with latest versions of gcc and clang on godbolt.org.
Alright, it looks fine. You can take a look at ndpi_memmem
if you want, because ndpi_strnstr
is basically a copypaste of that function.
Alright, it looks fine. You can take a look at
ndpi_memmem
if you want, becausendpi_strnstr
is basically a copypaste of that function.
I took a look at ndpi_memmem
and saw that it has the same issues. I've created pull request #2499.
@pasabanov, thank you for this contribution @0xA50C1A1, thanks for the review
Please sign (check) the below before submitting the Pull Request:
2433 #2439 #2447
Describe changes:
I made some minor changes to the current optimized substring search.
The main changes are:
len == 0
check from the firstif
statement. Now the function doesn't immediately returnNULL
iflen
is zero.I believe that the previous behavior was a bug, because if
needle_len
is zero, the function should returnhaystack
, because the empty substring does reside at the start of the empty string.hs_real_len
computation andif (needle_len == 0)
check.(needle_len == 1)
check because it only provides a performance benefit in specific cases. If you need to frequently search for single-character substrings, you might want to consider reverting this change.haystack_end
with variableend_of_search
(the byte after the last interesting byte).Using
end_of_search
inside thememchr
function instead ofhaystack_end
eliminates the need for thecurrent + needle_len <= haystack_end
check, as it is always positive ifcurrent
is notNULL
.I also added the new function to the performance tests. In practice
ndpi_strnstr_opt2
is slightly faster thanndpi_strnstr_opt
in most runs, though sometimesndpi_strnstr_opt
appears to be faster.