Open SanderVocke opened 5 years ago
"linux-x86-64,linux-x86-64-sse41,linux-x86-64-sse41-avx".
One issue here is that the code searches in the order specified (left to right), stopping at the first subtarget that is determined to be safe to use at runtime; this ordering should (in theory) only give you base x86-64, never using avx. What you want is
"linux-x86-64-sse41-avx,linux-x86-64-sse41,linux-x86-64".
...that said, that makes this even weirder. Can you tell where the AVX instruction is being used? Or which one is being used? Can you replicate it with a trivial example (which you could post)? Can you replicate it if you specify just "linux-x86-64"?
Thanks Steven, I'm looking into trying to make a minimal example of this and getting some extra information.
One thing I have noticed is that not a single vmovsd
instruction ends up in the application binary when I remove the AVX target from the list. So the AVX instruction that was triggered is definitely one generated by Halide, though I get that doesn't help much.
I will look at making a reproducible piece of code available.
The instruction triggered is at line 2c:
0000000000000000 <gaussian_blur_halide_gen_impl1>:
0: 41 57 push %r15
2: 41 56 push %r14
4: 53 push %rbx
5: 48 83 ec 20 sub $0x20,%rsp
9: 49 89 f6 mov %rsi,%r14
c: 48 89 fb mov %rdi,%rbx
f: 48 8b 05 00 00 00 00 mov 0x0(%rip),%rax # 16 <gaussian_blur_halide_gen_impl1+0x16>
16: 48 85 c0 test %rax,%rax
19: 74 11 je 2c <gaussian_blur_halide_gen_impl1+0x2c>
1b: 48 89 df mov %rbx,%rdi
1e: 4c 89 f6 mov %r14,%rsi
21: 48 83 c4 20 add $0x20,%rsp
25: 5b pop %rbx
26: 41 5e pop %r14
28: 41 5f pop %r15
2a: ff e0 jmpq *%rax
2c: c5 fb 11 44 24 08 vmovsd %xmm0,0x8(%rsp)
32: c5 fb 11 4c 24 10 vmovsd %xmm1,0x10(%rsp)
38: c5 fb 11 54 24 18 vmovsd %xmm2,0x18(%rsp)
3e: bf 10 00 00 08 mov $0x8000010,%edi
43: e8 00 00 00 00 callq 48 <gaussian_blur_halide_gen_impl1+0x48>
48: 85 c0 test %eax,%eax
4a: 75 17 jne 63 <gaussian_blur_halide_gen_impl1+0x63>
4c: 4c 8b 3d 00 00 00 00 mov 0x0(%rip),%r15 # 53 <gaussian_blur_halide_gen_impl1+0x53>
53: bf 00 00 00 08 mov $0x8000000,%edi
58: e8 00 00 00 00 callq 5d <gaussian_blur_halide_gen_impl1+0x5d>
5d: 85 c0 test %eax,%eax
5f: 75 17 jne 78 <gaussian_blur_halide_gen_impl1+0x78>
61: eb 1c jmp 7f <gaussian_blur_halide_gen_impl1+0x7f>
63: 4c 8b 3d 00 00 00 00 mov 0x0(%rip),%r15 # 6a <gaussian_blur_halide_gen_impl1+0x6a>
6a: bf 00 00 00 08 mov $0x8000000,%edi
6f: e8 00 00 00 00 callq 74 <gaussian_blur_halide_gen_impl1+0x74>
74: 85 c0 test %eax,%eax
Looks to me like this is the part where the filter implementation gets selected.
This bit of the assembly looks very different when I reverse the target orders, with no vmovsd
instructions at that particular place.
Reading the source, looks like the wrapper code uses the last target in the list, making the assumption that it's the most general one. We could change that code to get it to use the most generic target, or (probably more helpful) we could error out if we detect that any target in the list has an earlier target which is more general, to nudge people into listing them in the desired order.
So Sander: putting the targets in the intended order (most advanced to least advanced) will probably fix the problem.
Sorry for the radio silence. Putting targets in the intended order indeed solved our issue. Thanks for the help!
Feel free to close this issue if you don't plan on making any changes.
I vote to leave it open, as I think that generating a compile error in this case is likely a good solution to prevent this in the future. (I'm not likely to get around to it soon, so a PR to do this would be welcome.)
Hi,
I have built an application using a Halide-generated filter. Halide 1:2018.02.15-1 was used. The filter was generated for the following targets:
The application runs without any issue on a VirtualBox Ubuntu VM when AVX and AVX2 are enabled. However, when I disable AVX and AVX2 on the VM and run the application again I receive the following message: [ 555.468199] traps: my-application[1648] trap invalid opcode ip:d0360c sp:7efd85ff7250 error:0 in my-application[400000+10c2000]
I looked for the d0360c in the object by
objdump -d my-appliction | grep -i
and it resulted in: d0360c: c5 fb 11 44 24 08 vmovsd %xmm0,0x8(%rsp)The cpu flags after disabling AVX and AVX2 are checked by
cat /proc/cpuinfo | grep flags
and the result is: flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx rdtscp lm constant_tsc rep_good nopl xtopology nonstop_tsc pni pclmulqdq ssse3 cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt aes xsave rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single kaiser fsgsbase invpcid rdseed flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx rdtscp lm constant_tsc rep_good nopl xtopology nonstop_tsc pni pclmulqdq ssse3 cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt aes xsave rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single kaiser fsgsbase invpcid rdseed flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx rdtscp lm constant_tsc rep_good nopl xtopology nonstop_tsc pni pclmulqdq ssse3 cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt aes xsave rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single kaiser fsgsbase invpcid rdseed flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx rdtscp lm constant_tsc rep_good nopl xtopology nonstop_tsc pni pclmulqdq ssse3 cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt aes xsave rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single kaiser fsgsbase invpcid rdseedI also checked the "cpuid" command on the VM (which I believe the Halide runtime uses for determining target capabilities). That also reports AVX to be not supported.
It seems that Halide does not choose the appropriate target to execute the code when AVX and AVX2 are disabled.
Am I missing something obvious here?