Closed zchrissirhcz closed 2 years ago
There are so many prefetch related wrappers, some seems to be duplicated:
src/ppl/cv/arm/operation_utils.hpp:
inline void prefetch(const void *ptr, size_t offset = 32 * 10)
{
__builtin_prefetch(reinterpret_cast<const char *>(ptr) + offset);
}
inline void prefetch_l1(const void *ptr, size_t offset)
{
asm volatile(
"prfm pldl1keep, [%0, %1]\n\t"
:
: "r"(ptr), "r"(offset)
: "cc", "memory");
}
src/ppl/cv/arm/common.hpp:
namespace ppl {
namespace cv {
namespace arm {
inline void prefetch(const void *ptr, size_t offset = 1024)
{
#if defined __GNUC__
__builtin_prefetch(reinterpret_cast<const char *>(ptr) + offset);
#elif defined _MSC_VER && defined CAROTENE_NEON
__prefetch(reinterpret_cast<const char *>(ptr) + offset);
#else
(void)ptr;
(void)offset;
#endif
}
static inline void prefetch_range(const void *addr, size_t len)
{
#ifdef ARCH_HAS_PREFETCH
char *cp;
char *end = addr + len;
for (cp = addr; cp < end; cp += PREFETCH_STRIDE)
__builtin_prefetch(cp);
#endif
}
}}} // namespace ppl::cv::arm
Hi, ppl.cv developers
On arm64 platform, I benchmarked the
resize()
related functions, and notice the prefetching boost the speed significantly.What I'm confusing is, there are two wrapper functions,
prefetch()
andprefetch_l1()
. Can they be merged into one? If not, what is the different of them?and
I read the GNU manual for
__builtin_prefetch
, but still not sure: https://gcc.gnu.org/onlinedocs/gcc/Other-Builtins.html