torch / torch7

http://torch.ch
Other
8.97k stars 2.38k forks source link

THVector_(add),(mul) -> (adds),(mul) for VSX. #990

Closed gchanan closed 7 years ago

gchanan commented 7 years ago

This was previously completed for other architectures.

gchanan commented 7 years ago

This is to fix: https://github.com/pytorch/pytorch/issues/922

Disclaimer: my MiniCloud instance seems to have crashed before the build successfully completed, but it made it past the VSX errors. I'm trying now with a larger instance type...

gchanan commented 7 years ago

update: I was able to successfully build torch7 on ppc64le with this.

gut commented 7 years ago

@gchanan : How did you build it on ppc64le? I tried on Ubuntu 16.10 and newest gcc-6 but I still have this issue:

~/torch-distro/pkg/torch/lib/TH/vector]$ /usr/bin/gcc-6  -DHAVE_MALLOC_USABLE_SIZE=1 -DHAVE_MMAP=1 -DHAVE_SHM_OPEN=1 -DHAVE_SHM_UNLINK=1 -DTH_EXPORTS -DUSE_GCC_ATOMICS=1 -D_FILE_OFFSET_BITS=64 -I/home/gut/torch-distro/pkg/torch/build/lib/TH  -Werror=implicit-function-declaration -Werror=format -fopenmp -std=gnu99 -fopenmp -DUSE_GCC_GET_CPUID -DTH_HAVE_THREAD -O3 -DNDEBUG -fPIC   -o CMakeFiles/TH.dir/THVector.c.o   -c /home/gut/torch-distro/pkg/torch/lib/TH/THVector.c
In file included from /home/gut/torch-distro/pkg/torch/lib/TH/THVector.c:10:0:
/home/gut/torch-distro/pkg/torch/lib/TH/vector/VSX.c: In function ‘THFloatVector_muls_VSX’:
/home/gut/torch-distro/pkg/torch/lib/TH/vector/VSX.c:983:1: error: unrecognizable insn:
 }
 ^
(insn 26 25 28 4 (set (reg:V4SF 392)
        (vec_select:V4SF (vec_select:V4SF (mem:V4SF (reg/f:DI 393 [ _15 ]) [0  S16 A8])
                (parallel [
                        (const_int 3 [0x3])
                        (const_int 2 [0x2])
                        (const_int 1 [0x1])
                        (const_int 0 [0])
                    ]))
            (parallel:V4SF [
                    (const_int 2 [0x2])
                    (const_int 3 [0x3])
                    (const_int 0 [0])
                    (const_int 1 [0x1])
                ]))) /home/gut/torch-distro/pkg/torch/lib/TH/vector/VSX.c:900 -1
     (expr_list:REG_DEAD (reg/f:DI 393 [ _15 ])
        (expr_list:REG_EQUAL (vec_select:V4SF (vec_select:V4SF (mem:V4SF (reg:DI 342 [ ivtmp.3194 ]) [0  S16 A8])
                    (parallel [
                            (const_int 3 [0x3])
                            (const_int 2 [0x2])
                            (const_int 1 [0x1])
                            (const_int 0 [0])
                        ]))
                (parallel:V4SF [
                        (const_int 2 [0x2])
                        (const_int 3 [0x3])
                        (const_int 0 [0])
                        (const_int 1 [0x1])
                    ]))
            (nil))))
/home/gut/torch-distro/pkg/torch/lib/TH/vector/VSX.c:983:1: internal compiler error: in extract_insn, at recog.c:2287
Please submit a full bug report,
with preprocessed source if appropriate.
See <file:///usr/share/doc/gcc-6/README.Bugs> for instructions.

I upgraded torch myself on torch-distro and VSX.c is lastly modified by your 00fd283988a004ba721af08345bb88992f704053 change

soumith commented 7 years ago

@gut what you see seems like an actual compiler bug in gcc 6.xx. We compiled with gcc 4.8.3 and it compiled fine.

gut commented 7 years ago

checked! gcc-4.8.5-4ubuntu4 works! Thanks.

tomsercu commented 7 years ago

@gut will you update in the power distro? Also on the compiled version do you see the segfaults described in pytorch/pytorch#922 ?

gut commented 7 years ago

@tomsercu I just did on https://github.com/PPC64/torch-distro I'll check the segfault and comment on that issue, if I face any.

tomsercu commented 7 years ago

@gut did you confirm torch.test() nn.test() etc pass? What kind of system are you on? I'm still seeing segfaults with the current PPC64/distro (torch/torch7@82fb7ea), actually segfaults at the same point as before this PR was merged. I'm on a rhel7.2 p8 machine with the default /usr/bin/gcc version 4.8.5. Do you see something similar?

dccpc278[/speech7/multimodal/torch/20170404-ppc64le]$ gdb --args bash $(which th) -e "torch.test()"
GNU gdb (GDB) Red Hat Enterprise Linux 7.6.1-80.el7
Copyright (C) 2013 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "ppc64le-redhat-linux-gnu".
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>...
Reading symbols from /usr/bin/bash...Reading symbols from /usr/bin/bash...(no debugging symbols found)...done.
(no debugging symbols found)...done.
Missing separate debuginfos, use: debuginfo-install bash-4.2.46-19.el7.ppc64le
(gdb) run
Starting program: /usr/bin/bash /speech7/multimodal/torch/20170404-ppc64le/install/bin/th -e torch.test\(\)
process 48000 is executing new program: /speech7/multimodal/torch/20170404-ppc64le/install/bin/luajit
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/power8/libthread_db.so.1".
Detaching after fork from child process 48007.
Detaching after fork from child process 48014.
Detaching after fork from child process 48015.
Running 163 tests
  1/163 ceil ............................................................ [PASS]
  2/163 tan ............................................................. [PASS]
  3/163 baddbmm ......................................................... [WAIT]
Program received signal SIGSEGV, Segmentation fault.
0x0000100000bb42b4 in THDoubleVector_muls_VSX () from /speech7/multimodal/torch/20170404-ppc64le/install/lib/libTH.so.0
gut commented 7 years ago

No, they segfault as I posted on issue #922. The update on torch-distro focused on having the package built. We're investigating it on #922, please take a closer look