Open zchrissirhcz opened 3 weeks ago
The testing image size is large than github limit. For the performance test, we can just generate it from C++ code:
int create_test_7680_4320_png_image()
{
const std::string image_path = "lena.png";
cv::Mat image = cv::imread(image_path);
if (image.empty()) {
std::cerr << "image file not found" << std::endl;
return -1;
}
int originalWidth = image.cols;
int originalHeight = image.rows;
int targetWidth = 7680;
int targetHeight = 4320;
int rows = targetHeight / originalHeight;
int cols = targetWidth / originalWidth;
cv::Mat result = cv::Mat(targetHeight, targetWidth, CV_8UC4, cv::Scalar(0, 0, 0, 0));
cv::Mat imageWithAlpha;
cv::cvtColor(image, imageWithAlpha, cv::COLOR_BGR2BGRA);
for (int i = 0; i < rows; ++i) {
for (int j = 0; j < cols; ++j) {
int x = j * originalWidth;
int y = i * originalHeight;
imageWithAlpha.copyTo(result(cv::Rect(x, y, originalWidth, originalHeight)));
}
}
cv::imwrite("result.png", result);
return 0;
}
What exactly code do I use
What's the command (the compiler invocation) to build that code?
I use CMake for build:
cmake -S . -B build -DCMAKE_BUILD_TYPE=Release
The compiler is AppleClang:
Apple clang version 15.0.0 (clang-1500.1.0.2.5)
Target: arm64-apple-darwin23.2.0
Thread model: posix
InstalledDir: /Library/Developer/CommandLineTools/usr/bin
I'm not very familiar with cmake (and I don't have an Apple M1). Do you know if -DCMAKE_BUILD_TYPE=Release
passes -O2
or -O3
to clang?
Also, do you know, after the #include "wuffs-unsupported-snapshot.c"
line, if the WUFFS_BASE__CPU_ARCH__ARM_CRC32
and WUFFS_BASE__CPU_ARCH__ARM_NEON
macros are defined?
Specifically, if you do something like
#ifdef WUFFS_BASE__CPU_ARCH__ARM_CRC32
#error "asdf1"
#else
#error "asdf2"
#endif
Do you see asdf1
or asdf2
. Ditto for #ifdef WUFFS_BASE__CPU_ARCH__ARM_NEON
.
-O3
is used. I find it in build/compile_commands.json:
{
"directory": "/Users/zz/work/cppsober/kcv/build",
"command": "/Library/Developer/CommandLineTools/usr/bin/c++ -DGL_SILENCE_DEPRECATION -isystem /opt/homebrew/Cellar/opencv/4.9.0_8/include/opencv4 -isystem /Users/zz/.arcpkg/birch/autotimer/0.1/mac-arm64-static/inc/birch -O3 -DNDEBUG -std=gnu++17 -arch arm64 -isysroot /Library/Developer/CommandLineTools/SDKs/MacOSX14.2.sdk -o CMakeFiles/test_wuffs.dir/imwrite.cpp.o -c /Users/zz/work/cppsober/kcv/imwrite.cpp",
"file": "/Users/zz/work/cppsober/kcv/imwrite.cpp",
"output": "CMakeFiles/test_wuffs.dir/imwrite.cpp.o"
},
WUFFS_BASE__CPU_ARCH__ARM_CRC32
and WUFFS_BASE__CPU_ARCH__ARM_NEON
are enabled.
// To simplify Wuffs code, "cpu_arch >= arm_xxx" requires xxx but also
// unaligned little-endian load/stores.
#if defined(__ARM_FEATURE_UNALIGNED) && !defined(__native_client__) && \
defined(__BYTE_ORDER__) && (__BYTE_ORDER__ == __ORDER_LITTLE_ENDIAN__)
// Not all gcc versions define __ARM_ACLE, even if they support crc32
// intrinsics. Look for __ARM_FEATURE_CRC32 instead.
#if defined(__ARM_FEATURE_CRC32)
#include <arm_acle.h>
#define WUFFS_BASE__CPU_ARCH__ARM_CRC32
#pragma message "WUFFS_BASE__CPU_ARCH__ARM_CRC32: YES" // new added
#endif // defined(__ARM_FEATURE_CRC32)
#if defined(__ARM_NEON)
#include <arm_neon.h>
#define WUFFS_BASE__CPU_ARCH__ARM_NEON
#pragma message "WUFFS_BASE__CPU_ARCH__ARM_NEON: YES" // new added
#endif // defined(__ARM_NEON)
#endif // defined(__ARM_FEATURE_UNALIGNED) etc
The outout of compilation:
➜ kcv git:(main) ✗ cmake --build build -j8
[ 56%] Built target glfw
[ 76%] Built target imgui
[ 89%] Built target konacv
[ 94%] Built target test
[ 97%] Building CXX object CMakeFiles/test_wuffs.dir/imwrite.cpp.o
In file included from /Users/zz/work/cppsober/kcv/imwrite.cpp:117:
/Users/zz/work/cppsober/kcv/wuffs-unsupported-snapshot.c:120:9: warning: WUFFS_BASE__CPU_ARCH__ARM_CRC32: YES [-W#pragma-messages]
#pragma message "WUFFS_BASE__CPU_ARCH__ARM_CRC32: YES"
^
/Users/zz/work/cppsober/kcv/wuffs-unsupported-snapshot.c:125:9: warning: WUFFS_BASE__CPU_ARCH__ARM_NEON: YES [-W#pragma-messages]
#pragma message "WUFFS_BASE__CPU_ARCH__ARM_NEON: YES"
^
2 warnings generated.
[100%] Linking CXX executable test_wuffs
[100%] Built target test_wuffs
OK, I don't think there's an obvious fix. Still, I don't have an Apple M1 so it might take me a while to make progress on this.
Can you e-mail the image file (or a link to it) to nigeltao golang org
? Thanks.
OK, I don't think there's an obvious fix. Still, I don't have an Apple M1 so it might take me a while to make progress on this.
Can you e-mail the image file (or a link to it) to
nigeltao golang org
? Thanks.
Been sent, please check.
Thanks for sharing your 7680x4320 image. My wuffs bench
time-to-decode numbers on x86_64
Intel (i5-10210U
Comet Lake), not arm64
Apple (M1):
370ms wuffs latest (clang 14)
334ms wuffs latest (gcc 12)
533ms libpng (Debian 12 Bookworm)
Looks like I'm going to have to find an Apple M1 (or similar)...
Problem
When decoding a big image (height=4320, width=7680, channels=4, data type = uint8_t), wuffs is much slow than OpenCV 4.9.0, on Apple M1 (Mac-mini).
Time cost
7680x4320 image
OpenCV 4.9.0 details
which is built on libpng 1.6.43:
What exactly code do I use