openppl-public / ppl.cv

ppl.cv is a high-performance image processing library of openPPL supporting various platforms.
Apache License 2.0
484 stars 108 forks source link

prefetch() vs prefetch_l1() #69

Closed zchrissirhcz closed 2 years ago

zchrissirhcz commented 2 years ago

Hi, ppl.cv developers

On arm64 platform, I benchmarked the resize() related functions, and notice the prefetching boost the speed significantly.

What I'm confusing is, there are two wrapper functions, prefetch() and prefetch_l1(). Can they be merged into one? If not, what is the different of them?

inline void prefetch(const void *ptr, size_t offset = 32 * 10)
{
    __builtin_prefetch(reinterpret_cast<const char *>(ptr) + offset);
}

and

inline void prefetch_l1(const void *ptr, size_t offset)
{
    asm volatile(
        "prfm pldl1keep, [%0, %1]\n\t"
        :
        : "r"(ptr), "r"(offset)
        : "cc", "memory");
}

I read the GNU manual for __builtin_prefetch, but still not sure: https://gcc.gnu.org/onlinedocs/gcc/Other-Builtins.html

zchrissirhcz commented 2 years ago

There are so many prefetch related wrappers, some seems to be duplicated:

src/ppl/cv/arm/operation_utils.hpp:

inline void prefetch(const void *ptr, size_t offset = 32 * 10)
{
    __builtin_prefetch(reinterpret_cast<const char *>(ptr) + offset);
}

inline void prefetch_l1(const void *ptr, size_t offset)
{
    asm volatile(
        "prfm pldl1keep, [%0, %1]\n\t"
        :
        : "r"(ptr), "r"(offset)
        : "cc", "memory");
}

src/ppl/cv/arm/common.hpp:


namespace ppl {
namespace cv {
namespace arm {

inline void prefetch(const void *ptr, size_t offset = 1024)
{
#if defined __GNUC__
    __builtin_prefetch(reinterpret_cast<const char *>(ptr) + offset);
#elif defined _MSC_VER && defined CAROTENE_NEON
    __prefetch(reinterpret_cast<const char *>(ptr) + offset);
#else
    (void)ptr;
    (void)offset;
#endif
}

static inline void prefetch_range(const void *addr, size_t len)
{
#ifdef ARCH_HAS_PREFETCH
    char *cp;
    char *end = addr + len;

    for (cp = addr; cp < end; cp += PREFETCH_STRIDE)
        __builtin_prefetch(cp);
#endif
}

}}} // namespace ppl::cv::arm