Closed tudorsuciu closed 5 years ago
Hi, thanks for reporting this issue. The next patch is unfortunately very large but convolve.h seems to be the most likely cause. Would you kindly let us know which instruction is faulting (gdb) so we can narrow it down?
Hello,
Thank you for looking at the issue. I cannot get a proper core file, compilation doesn't work with msse4.1. Intel sde gives this for 98495235a9017b0f5fca7e9963da7f8ed1c2573d (no patches applied): ~/Downloads/sde-external-8.16.0-2018-01-30-lin/sde64 -- ./bin/cpik test3.png out3.pik Compressing with maximum Butteraugli distance 1.000000 Illegal instruction at address = 55e95da9ee9e: 0f 01 f9 48 c1 e2 20 48 09 d0 0f ae e8 48 89 Image name: /home/tudor/pik/bin/cpik Offset in image: 0x31e9e If you believe your application should attempt to execute this illegal instruction (and others that may be present), Then use this knob: -emit-illegal-insts 0 and this error message will be avoided.
If this is not helpful to you, i will try to attach gdb as described in sde documentation to get a proper backtrace(i hope).
Regards,
I could attach gdb to sde but symbols are not recognized. The backtrace is not usefull.
in gdb info function gives:
0x0000000000031190 pik::ButteraugliDistance(pik::Image3
So the invalid instruction happens here: pik::Zone::~Zone() [clone .constprop.225]
I got a 1,6Gb file from sde with all instructions executed. This is the last page: Write (UINT64)0x00007ffe4aca2090 = 0x7ffe4aca3200 INS 0x0000560e4f0d807d BASE nop dword ptr [rax], eax INS 0x0000560e4f0d8080 SSE movaps xmmword ptr [rsp], xmm4 Write (UINT128)0x00007ffe4aca2080 = 00000000_00000000_00000000_00000001 INS 0x0000560e4f0d8084 BASE lea rdi, ptr [rip+0xc6d11] | rdi = 0x560e4f19ed9c INS 0x0000560e4f0d808b BASE call 0x560e4f0d0ea0 | rsp = 0x7ffe4aca2078 Write (UINT64)0x00007ffe4aca2078 = 0x560e4f0d8090 INS 0x0000560e4f0d0ea0 BASE push r15 | rsp = 0x7ffe4aca2070 Write (UINT64)0x00007ffe4aca2070 = 0x50 INS 0x0000560e4f0d0ea2 BASE push r14 | rsp = 0x7ffe4aca2068 Write (UINT64)0x00007ffe4aca2068 = 0x7ffe4aca3670 INS 0x0000560e4f0d0ea4 BASE push r13 | rsp = 0x7ffe4aca2060 Write (UINT64)0x00007ffe4aca2060 = 0x560e4f423ea8 INS 0x0000560e4f0d0ea6 BASE push r12 | rsp = 0x7ffe4aca2058 Write (UINT64)0x00007ffe4aca2058 = 0x7ffe4aca2200 INS 0x0000560e4f0d0ea8 BASE push rbp | rsp = 0x7ffe4aca2050 Write (UINT64)0x00007ffe4aca2050 = 0x560e509a1d40 INS 0x0000560e4f0d0ea9 BASE push rbx | rsp = 0x7ffe4aca2048 Write (UINT64)0x00007ffe4aca2048 = 0x560e509a1880 INS 0x0000560e4f0d0eaa BASE mov rbx, rdi | rbx = 0x560e4f19ed9c INS 0x0000560e4f0d0ead BASE sub rsp, 0x18 | rsp = 0x7ffe4aca2030, rflags = 0x206 Read 0x7f923f7f9080 = (UINT64)0x00007f923f7f9080 INS 0x0000560e4f0d0eb1 BASE mov rax, qword ptr fs:[0x0] | rax = 0x7f923f7f9080 Read 0x560e509a1880 = (UINT64)0x00007f923f7f9078 INS 0x0000560e4f0d0ebd BASE mov rsi, qword ptr [rax-0x8] | rsi = 0x560e509a1880 INS 0x0000560e4f0d0ec4 BASE test rsi, rsi | rflags = 0x202 INS 0x0000560e4f0d0ec7 BASE jz 0x560e4f0d11c0 INS 0x0000560e4f0d0ecd SSE2 lfence INS 0x0000560e4f0d0ed0 BASE rdtsc | rax = 0x37307c1b, rdx = 0x9f7 INS 0x0000560e4f0d0ed2 BASE shl rdx, 0x20 | rdx = 0x9f700000000, rflags = 0x206 INS 0x0000560e4f0d0ed6 BASE or rax, rdx | rax = 0x9f737307c1b, rflags = 0x206 INS 0x0000560e4f0d0ed9 SSE2 lfence Read 0x560e4b19ed4b = (UINT64)0x0000560e509a18e0 INS 0x0000560e4f0d0edc BASE sub rbx, qword ptr [rsi+0x60] | rbx = 0x4000051, rflags = 0x202 INS 0x0000560e4f0d0ee0 BASE mov rbp, 0x1fffffffff | rbp = 0x1fffffffff INS 0x0000560e4f0d0eea BASE and rax, rbp | rax = 0x1737307c1b, rflags = 0x206 INS 0x0000560e4f0d0eed BASE shl rbx, 0x25 | rbx = 0x80000a2000000000, rflags = 0x286 INS 0x0000560e4f0d0ef1 BASE lea rdi, ptr [rbx+rax1] | rdi = 0x80000a3737307c1b Read 0 = (UINT64)0x0000560e509a18c0 INS 0x0000560e4f0d0ef5 BASE mov rax, qword ptr [rsi+0x40] | rax = 0 INS 0x0000560e4f0d0ef9 BASE cmp rax, 0x8 | rflags = 0x293 INS 0x0000560e4f0d0efd BASE jz 0x560e4f0d0f28 INS 0x0000560e4f0d0eff BASE lea rdx, ptr [rax8] | rdx = 0 INS 0x0000560e4f0d0f07 BASE add rax, 0x1 | rax = 0x1, rflags = 0x202 INS 0x0000560e4f0d0f0b BASE mov qword ptr [rsi+rdx1], rdi Write (UINT64)0x0000560e509a1880 = 0x80000a3737307c1b INS 0x0000560e4f0d0f0f BASE mov qword ptr [rsi+0x40], rax Write (UINT64)0x0000560e509a18c0 = 0x1 INS 0x0000560e4f0d0f13 BASE add rsp, 0x18 | rsp = 0x7ffe4aca2048, rflags = 0x206 Read 0x560e509a1880 = (UINT64)0x00007ffe4aca2048 INS 0x0000560e4f0d0f17 BASE pop rbx | rbx = 0x560e509a1880, rsp = 0x7ffe4aca2050 Read 0x560e509a1d40 = (UINT64)0x00007ffe4aca2050 INS 0x0000560e4f0d0f18 BASE pop rbp | rbp = 0x560e509a1d40, rsp = 0x7ffe4aca2058 Read 0x7ffe4aca2200 = (UINT64)0x00007ffe4aca2058 INS 0x0000560e4f0d0f19 BASE pop r12 | r12 = 0x7ffe4aca2200, rsp = 0x7ffe4aca2060 Read 0x560e4f423ea8 = (UINT64)0x00007ffe4aca2060 INS 0x0000560e4f0d0f1b BASE pop r13 | r13 = 0x560e4f423ea8, rsp = 0x7ffe4aca2068 Read 0x7ffe4aca3670 = (UINT64)0x00007ffe4aca2068 INS 0x0000560e4f0d0f1d BASE pop r14 | r14 = 0x7ffe4aca3670, rsp = 0x7ffe4aca2070 Read 0x50 = (UINT64)0x00007ffe4aca2070 INS 0x0000560e4f0d0f1f BASE pop r15 | r15 = 0x50, rsp = 0x7ffe4aca2078 Read 0x560e4f0d8090 = (UINT64)0x00007ffe4aca2078 INS 0x0000560e4f0d0f21 BASE ret | rsp = 0x7ffe4aca2080 INS 0x0000560e4f0d8090 BASE call 0x560e4f0cbe90 | rsp = 0x7ffe4aca2078 Write (UINT64)0x00007ffe4aca2078 = 0x560e4f0d8095 INS 0x0000560e4f0cbe90 BASE push r15 | rsp = 0x7ffe4aca2070 Write (UINT64)0x00007ffe4aca2070 = 0x50 INS 0x0000560e4f0cbe92 BASE push r14 | rsp = 0x7ffe4aca2068 Write (UINT64)0x00007ffe4aca2068 = 0x7ffe4aca3670 INS 0x0000560e4f0cbe94 BASE push r13 | rsp = 0x7ffe4aca2060 Write (UINT64)0x00007ffe4aca2060 = 0x560e4f423ea8 INS 0x0000560e4f0cbe96 BASE push r12 | rsp = 0x7ffe4aca2058 Write (UINT64)0x00007ffe4aca2058 = 0x7ffe4aca2200 INS 0x0000560e4f0cbe98 BASE push rbp | rsp = 0x7ffe4aca2050 Write (UINT64)0x00007ffe4aca2050 = 0x560e509a1d40 INS 0x0000560e4f0cbe99 BASE push rbx | rsp = 0x7ffe4aca2048 Write (UINT64*)0x00007ffe4aca2048 = 0x560e509a1880 INS 0x0000560e4f0cbe9a BASE sub rsp, 0x18 | rsp = 0x7ffe4aca2030, rflags = 0x206 SIG signal=0x4 on thread 0 at address 0x560e4f0cbe9e FATALSIG4
Do this pieces of information help or more is needed?
Thank you for this information, it is very helpful and points us straight to the problem and a solution! The faulting instruction is RDTSCP, used for timing especially in profiler.h. I didn't realize this but Penryn indeed lacks that instruction.
A quick fix for the short term would be to change rdtscp to rdtsc in the inline asm in tsc_timer.h.
BTW we no longer use the 32-bit version (wraps around too quickly) so I can simplify the interface.
I'll also change cpik/dpik to use the OS timer instead. After that, you'll be able to change #if PROFILER_ENABLED
in profiler.h to #if PROFILER_ENABLED && 0
and this problem should be resolved.
With this patch:
diff --git a/tsc_timer.h b/tsc_timer.h
index b7783dd..36a2fee 100644
--- a/tsc_timer.h
+++ b/tsc_timer.h
@@ -125,13 +125,13 @@ inline uint64_t Stop<uint64_t>() {
#elif PIK_ARCH_X64 && PIK_COMPILER_MSVC
PIK_COMPILER_FENCE;
unsigned aux;
- t = __rdtscp(&aux);
+ t = __rdtsc(&aux);
SIMD_NAMESPACE::load_fence();
PIK_COMPILER_FENCE;
#elif PIK_ARCH_X64 && (PIK_COMPILER_CLANG || PIK_COMPILER_GCC)
// Use inline asm because __rdtscp generates code to store TSC_AUX (ecx).
asm volatile(
- "rdtscp\n\t"
+ "rdtsc\n\t"
"shl $32, %%rdx\n\t"
"or %%rdx, %0\n\t"
"lfence"
It compiles and executes correctly (at least on small test images).
With this code:
static inline void native_cpuid(unsigned int *eax, unsigned int *ebx,
unsigned int *ecx, unsigned int *edx)
{
/* ecx is often an input as well as an output. */
asm volatile("cpuid"
: "=a" (*eax),
"=b" (*ebx),
"=c" (*ecx),
"=d" (*edx)
: "0" (*eax), "2" (*ecx));
}
#include <stdio.h>
int main(int argc, char **argv)
{
unsigned eax, ebx, ecx, edx;
eax = 1; /* processor info and feature bits */
native_cpuid(&eax, &ebx, &ecx, &edx);
printf("stepping %d\n", eax & 0xF);
printf("model %d\n", (eax >> 4) & 0xF);
printf("family %d\n", (eax >> 8) & 0xF);
printf("processor type %d\n", (eax >> 12) & 0x3);
printf("extended model %d\n", (eax >> 16) & 0xF);
printf("extended family %d\n", (eax >> 20) & 0xFF);
eax = 0x80000001;
native_cpuid(&eax, &ebx, &ecx, &edx);
printf("rdtscp %d\n", (edx >> 27) & 0x1);
}
I get:
stepping 10
model 7
family 6
processor type 0
extended model 1
extended family 0
rdtscp 0
So rdtscp is not supported for penryn processor if this example code is ok. It means that Intel sde does not emulate/simulate correctly this instruction. It does seem to add the flag:
~/Downloads/sde-external-8.16.0-2018-01-30-lin/sde64 -- ./a.out
stepping 0
model 6
family 6
processor type 0
extended model 6
extended family 0
rdtscp 1
Would it be possible for you to fix the compilation errors that appear with a patched makefile like this?
diff --git a/Makefile b/Makefile
index 53ef001..3c58028 100644
--- a/Makefile
+++ b/Makefile
@@ -1,4 +1,4 @@
-SIMD_FLAGS := -DSIMD_ENABLE=4 -msse4.2 -maes
+SIMD_FLAGS := -DSIMD_ENABLE=4 -msse4.1
override CXXFLAGS += -std=c++11 -Wall -O3 -fPIC -I. -I../ -Ithird_party/brotli/c/include/ -Wno-sign-compare
override LDFLAGS += -lpthread
A solution to compile without sse would be even better for generic debug and to run the code on other architectures (arm64).
I wanted to see what sse4.2 instruction is used, so i deactivated the exit after "Cannot continue because CPU lacks SSE4 support". There is no instruction used in cpik/dpik that needs sse4.2. The only intrinsic that i found in simd library is: mm_cmpgt_epi64 In my case the problem is fixed, i was able (finally) to check/evaluate the quality of pik on big images. I don't know if the issue should be closed, it depends if there is somebody else impacted.
Regards,
Congratulations, happy to hear you got it working! The timers are fixed and will be in the next push. Looks like reducing from SSE4.2 to SSE4.1 would cover about 1.4% more systems according to Steam Hardware survey. That's surprising and possibly worthwhile, I'll add it to the backlog. Thanks for suggesting and investigating this!
Hello again, we've fixed this locally (removed cmpgt_epi64 and aes), so hopefully it would work in Penryn after the next push. Thanks again for suggesting and investigating this!
Hello,
I had a lot of trouble finding a version that works on a penryn processor. In the end this commit: 52f2d45cc8e35e45278da54615bb8b11b5066f16 works with this very small patch:
diff --git a/Makefile b/Makefile index 5e210ad..7cebd30 100644 --- a/Makefile +++ b/Makefile -override CXXFLAGS += -std=c++11 -Wall -O3 -fPIC -DSIMD_ENABLE=4 -msse4.2 -maes -I. -I../ -Ithird_party/brotli/c/include/ -Wno-sign-compare +override CXXFLAGS += -std=c++11 -Wall -O3 -fPIC -DSIMD_ENABLE=4 -msse4.1 -I. -I../ -Ithird_party/brotli/c/include/ -Wno-sign-compare
Between the work of dominikhlbg in #8 an this commit, it always compiles(sometimes with small patches) and executes correctly with Intel® Software Development Emulator, like this: ~/Downloads/sde-external-8.16.0-2018-01-30-lin/sde64 -- ./bin/cpik test.png test.pik
After this commit i always get invalid instruction. Could you look what is wrong with next commit? Is a newer gcc needed?
Regards,