Closed tszumski closed 3 weeks ago
Encoder: performance gain ~0.1% Decoder: performance gain ~2.5%
How was that measured? Total performance gain, or just on the DWT?
Encoder: performance gain ~0.1% Decoder: performance gain ~2.5%
How was that measured? Total performance gain, or just on the DWT?
Total performance gain
This breaks at least Windows x64 regression tests. Did you run the regression test suite locally? Make sure to clone https://github.com/uclouvain/openjpeg-data in a "data" subdirectory of your openjpeg repository, and run ctest
This breaks at least Windows x64 regression tests. Did you run the regression test suite locally? Make sure to clone https://github.com/uclouvain/openjpeg-data in a "data" subdirectory of your openjpeg repository, and run ctest
Thanks for reply, I was able to fix it. Root cause is that CI use MSVC 2015 toolset that does not have _mm256_extract_epi32 and _mm256_insert_epi32 intrinsic defined,
@tszumski Can you fix the formatting of the code according to the instructions at end of https://github.com/uclouvain/openjpeg/actions/runs/10718601930/job/29724100308?pr=1552, that is
* Enable WITH_ASTYLE in your cmake configuration to format C++ code
* Use "scripts/astyle.sh file" to fix the now badly indented files
* Consider using scripts/prepare-commit.sh as pre-commit hook to avoid this
in the future (ln -s scripts/prepare-commit.sh .git/hooks/pre-commit) or
run it manually before each commit.
@tszumski Can you fix the formatting of the code according to the instructions at end of https://github.com/uclouvain/openjpeg/actions/runs/10718601930/job/29724100308?pr=1552, that is
* Enable WITH_ASTYLE in your cmake configuration to format C++ code * Use "scripts/astyle.sh file" to fix the now badly indented files * Consider using scripts/prepare-commit.sh as pre-commit hook to avoid this in the future (ln -s scripts/prepare-commit.sh .git/hooks/pre-commit) or run it manually before each commit.
Fixed
@tszumski I don't have hardware to test AVX512F and github CI doesn't seem to have machines with it. Could you paste somewhere (here or a github gist if too large):
-mavx512f
@rouault For consistency with CI scans(windows) i applied flag -DCMAKE_C_FLAGS="/arch:AVX512" And i can confirm that AVX512F is set/defined* (test logs from MSVC 2017 and MSVC 2022 are identical)
Note: *MSVC added support for AVX512 in Visual Studio 2017
$ grep C_FLAGS CMakeCache.txt
CMAKE_C_FLAGS:STRING=/arch:AVX512
CMAKE_C_FLAGS_DEBUG:STRING=/MDd /Zi /Ob0 /Od /RTC1
CMAKE_C_FLAGS_MINSIZEREL:STRING=/MD /O1 /Ob1 /DNDEBUG
CMAKE_C_FLAGS_RELEASE:STRING=/MD /O2 /Ob2 /DNDEBUG
CMAKE_C_FLAGS_RELWITHDEBINFO:STRING=/MD /Zi /O2 /Ob1 /DNDEBUG
CMAKE_RC_FLAGS:STRING=-DWIN32
CMAKE_RC_FLAGS_DEBUG:STRING=-D_DEBUG
CMAKE_RC_FLAGS_MINSIZEREL:STRING=
CMAKE_RC_FLAGS_RELEASE:STRING=
CMAKE_RC_FLAGS_RELWITHDEBINFO:STRING=
//ADVANCED property for variable: CMAKE_C_FLAGS
CMAKE_C_FLAGS-ADVANCED:INTERNAL=1
//ADVANCED property for variable: CMAKE_C_FLAGS_DEBUG
CMAKE_C_FLAGS_DEBUG-ADVANCED:INTERNAL=1
//ADVANCED property for variable: CMAKE_C_FLAGS_MINSIZEREL
CMAKE_C_FLAGS_MINSIZEREL-ADVANCED:INTERNAL=1
//ADVANCED property for variable: CMAKE_C_FLAGS_RELEASE
CMAKE_C_FLAGS_RELEASE-ADVANCED:INTERNAL=1
//ADVANCED property for variable: CMAKE_C_FLAGS_RELWITHDEBINFO
CMAKE_C_FLAGS_RELWITHDEBINFO-ADVANCED:INTERNAL=1
//ADVANCED property for variable: CMAKE_RC_FLAGS
CMAKE_RC_FLAGS-ADVANCED:INTERNAL=1
//ADVANCED property for variable: CMAKE_RC_FLAGS_DEBUG
CMAKE_RC_FLAGS_DEBUG-ADVANCED:INTERNAL=1
//ADVANCED property for variable: CMAKE_RC_FLAGS_MINSIZEREL
CMAKE_RC_FLAGS_MINSIZEREL-ADVANCED:INTERNAL=1
//ADVANCED property for variable: CMAKE_RC_FLAGS_RELEASE
CMAKE_RC_FLAGS_RELEASE-ADVANCED:INTERNAL=1
//ADVANCED property for variable: CMAKE_RC_FLAGS_RELWITHDEBINFO
CMAKE_RC_FLAGS_RELWITHDEBINFO-ADVANCED:INTERNAL=1
1>------ Build started: Project: RUN_TESTS, Configuration: Release x64 ------
1>Test project D:/openjpeg/build-tests
1> Start 1: tte0
1> 1/1587 Test #1: tte0 ......................................................................................... Passed 0.88 sec
...
1> Start 1587: testjp2
1>1587/1587 Test #1587: testjp2 ...................................................................................... Passed 0.27 sec
1>
1>99% tests passed, 2 tests failed out of 1587
1>
1>Total Test time (real) = 224.17 sec
1>
1>The following tests FAILED:
1> 1147 - NR-DEC-issue226.j2k-74-decode (Failed)
1> 1148 - NR-DEC-issue226.j2k-74-decode-md5 (Failed)
1>Errors while running CTest
Failed Tests 1147/1148 are also failing on master branch without my changes
Failed Tests 1147/1148 are also failing on master branch without my changes
yes, those are "expected" (we have some a file in CI to ignore them
ok, so everything looks good. Merging
Encoder: performance gain ~0.1% Decoder: performance gain ~2.5%