redpony / cdec

Decoder, aligner, and model optimizer for statistical machine translation and other structured prediction models based on (mostly) context-free formalisms
http://cdec-decoder.org/
Apache License 2.0
183 stars 77 forks source link

Segmentation Fault: Compiling bilexical dictionary #66

Open andidol opened 9 years ago

andidol commented 9 years ago

During the step of compiling the bilexical dictionary (cdec/python/cdec/sa/compile.py line 124) I get a Segmentation Fault after some time.

INFO:cdec.sa.compile:Compiling source suffix array INFO:cdec.sa.compile:Compiling source suffix array took ... seconds INFO:cdec.sa.compile:Compiling target data array INFO:cdec.sa.compile:Compiling target data array took ... seconds INFO:cdec.sa.compile:Compiling Precomputing frequent phrases INFO:cdec.sa.compile:Compiling Compiling precomputations took ... seconds INFO:cdec.sa.compile:Compiling alignment INFO:cdec.sa.compile:Compiling alignment took ... seconds INFO:cdec.sa.compile:Compiling bilexical dictionary Segmentation fault (core dumped)

ProblemType: Crash Architecture: amd64 Date: Wed Jan 7 00:15:23 2015 DistroRelease: Ubuntu 14.04 ExecutablePath: /usr/bin/python2.7 ExecutableTimestamp: 1395532628 ProcCmdline: python -m cdec.sa.compile -a /home/ubuntu/demo-en-de/corpus.en-de.gdfa -b /home/ubuntu/demo-en-de/corpus.en-de --online -o /home/ubuntu/demo-en-de/sa ProcCwd: /home/ubuntu/cdec/python ProcEnviron: TERM=xterm-256color SHELL=/bin/bash PATH=(custom, no user) LANG=en_US.UTF-8 XDG_RUNTIME_DIR= ProcMaps: 00400000-006bd000 r-xp 00000000 ca:01 903 /usr/bin/python2.7 008bc000-008bd000 r--p 002bc000 ca:01 903 /usr/bin/python2.7 008bd000-00932000 rw-p 002bd000 ca:01 903 /usr/bin/python2.7 00932000-00944000 rw-p 00000000 00:00 0 0213d000-891889000 rw-p 00000000 00:00 0 [heap] 7f79f0316000-7f7a8c6ee000 rw-p 00000000 00:00 0 7f7ac7901000-7f7b590ea000 rw-p 00000000 00:00 0 7f7bc4bde000-7f7c768f8000 rw-p 00000000 00:00 0 7f7c768f8000-7f7d1c612000 rw-p 00000000 00:00 0 7f7d1cfd4000-7f7d1d1d4000 rw-p 00000000 00:00 0 7f7d1f1b5000-7f7d1f1f5000 rw-p 00000000 00:00 0 7f7d1fd75000-7f7d24c16000 rw-p 00000000 00:00 0 7f7d255d2000-7f7d25a12000 rw-p 00000000 00:00 0 7f7d25a12000-7f7d28672000 rw-p 00000000 00:00 0 7f7d28694000-7f7d2bc14000 rw-p 00000000 00:00 0 7f7d2bc3f000-7f7d2db3f000 rw-p 00000000 00:00 0 7f7d2db3f000-7f7dd3859000 rw-p 00000000 00:00 0 7f7dd3882000-7f7dd53c2000 rw-p 00000000 00:00 0 7f7dd53e1000-7f7dd93e1000 rw-p 00000000 00:00 0 7f7dd93e2000-7f7ddf3e3000 rw-p 00000000 00:00 0 7f7ddf3e3000-7f7de4284000 rw-p 00000000 00:00 0 7f7de4284000-7f7de4d84000 rw-p 00000000 00:00 0 7f7de4d94000-7f7de5594000 rw-p 00000000 00:00 0 7f7de55b4000-7f7de9fb4000 rw-p 00000000 00:00 0 7f7de9fd0000-7f7deae10000 rw-p 00000000 00:00 0 7f7deae23000-7f7deb063000 rw-p 00000000 00:00 0 7f7deb094000-7f7deb194000 rw-p 00000000 00:00 0 7f7deb1c2000-7f7deb282000 rw-p 00000000 00:00 0 7f7deb283000-7f7deb544000 rw-p 00000000 00:00 0 7f7deb544000-7f7deb546000 r-xp 00000000 ca:01 5421 /usr/lib/python2.7/lib-dynload/resource.x86_64-linux-gnu.so 7f7deb546000-7f7deb745000 ---p 00002000 ca:01 5421 /usr/lib/python2.7/lib-dynload/resource.x86_64-linux-gnu.so 7f7deb745000-7f7deb746000 r--p 00001000 ca:01 5421 /usr/lib/python2.7/lib-dynload/resource.x86_64-linux-gnu.so 7f7deb746000-7f7deb747000 rw-p 00002000 ca:01 5421 /usr/lib/python2.7/lib-dynload/resource.x86_64-linux-gnu.so 7f7deb747000-7f7deb787000 rw-p 00000000 00:00 0 7f7deb787000-7f7deb861000 r-xp 00000000 ca:01 789057 /home/ubuntu/cdec/python/cdec/sa/_sa.so 7f7deb861000-7f7deba60000 ---p 000da000 ca:01 789057 /home/ubuntu/cdec/python/cdec/sa/_sa.so 7f7deba60000-7f7deba61000 r--p 000d9000 ca:01 789057 /home/ubuntu/cdec/python/cdec/sa/_sa.so 7f7deba61000-7f7deba70000 rw-p 000da000 ca:01 789057 /home/ubuntu/cdec/python/cdec/sa/_sa.so 7f7deba70000-7f7deba73000 rw-p 00000000 00:00 0 7f7deba73000-7f7deba89000 r-xp 00000000 ca:01 396027 /lib/x86_64-linux-gnu/libgcc_s.so.1 7f7deba89000-7f7debc88000 ---p 00016000 ca:01 396027 /lib/x86_64-linux-gnu/libgcc_s.so.1 7f7debc88000-7f7debc89000 rw-p 00015000 ca:01 396027 /lib/x86_64-linux-gnu/libgcc_s.so.1 7f7debc89000-7f7debd6f000 r-xp 00000000 ca:01 7890 /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.19 7f7debd6f000-7f7debf6e000 ---p 000e6000 ca:01 7890 /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.19 7f7debf6e000-7f7debf76000 r--p 000e5000 ca:01 7890 /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.19 7f7debf76000-7f7debf78000 rw-p 000ed000 ca:01 7890 /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.19 7f7debf78000-7f7debf8d000 rw-p 00000000 00:00 0 7f7debf8d000-7f7debf90000 r-xp 00000000 ca:01 26436 /usr/lib/x86_64-linux-gnu/libboost_system.so.1.54.0 7f7debf90000-7f7dec18f000 ---p 00003000 ca:01 26436 /usr/lib/x86_64-linux-gnu/libboost_system.so.1.54.0 7f7dec18f000-7f7dec190000 r--p 00002000 ca:01 26436 /usr/lib/x86_64-linux-gnu/libboost_system.so.1.54.0 7f7dec190000-7f7dec191000 rw-p 00003000 ca:01 26436 /usr/lib/x86_64-linux-gnu/libboost_system.so.1.54.0 7f7dec191000-7f7dec1f8000 r-xp 00000000 ca:01 26447 /usr/lib/x86_64-linux-gnu/libboost_serialization.so.1.54.0 7f7dec1f8000-7f7dec3f7000 ---p 00067000 ca:01 26447 /usr/lib/x86_64-linux-gnu/libboost_serialization.so.1.54.0 7f7dec3f7000-7f7dec3fb000 r--p 00066000 ca:01 26447 /usr/lib/x86_64-linux-gnu/libboost_serialization.so.1.54.0 7f7dec3fb000-7f7dec3fc000 rw-p 0006a000 ca:01 26447 /usr/lib/x86_64-linux-gnu/libboost_serialization.so.1.54.0 7f7dec3fc000-7f7dec465000 r-xp 00000000 ca:01 26524 /usr/lib/x86_64-linux-gnu/libboost_program_options.so.1.54.0 7f7dec465000-7f7dec665000 ---p 00069000 ca:01 26524 /usr/lib/x86_64-linux-gnu/libboost_program_options.so.1.54.0 7f7dec665000-7f7dec669000 r--p 00069000 ca:01 26524 /usr/lib/x86_64-linux-gnu/libboost_program_options.so.1.54.0 7f7dec669000-7f7dec66a000 rw-p 0006d000 ca:01 26524 /usr/lib/x86_64-linux-gnu/libboost_program_options.so.1.54.0 7f7dec66a000-7f7deca68000 r-xp 00000000 ca:01 789053 /home/ubuntu/cdec/python/cdec/_cdec.so 7f7deca68000-7f7decc67000 ---p 003fe000 ca:01 789053 /home/ubuntu/cdec/python/cdec/_cdec.so 7f7decc67000-7f7decc79000 r--p 003fd000 ca:01 789053 /home/ubuntu/cdec/python/cdec/_cdec.so 7f7decc79000-7f7decc85000 rw-p 0040f000 ca:01 789053 /home/ubuntu/cdec/python/cdec/_cdec.so 7f7decc85000-7f7deccaa000 rw-p 00000000 00:00 0 7f7deccaa000-7f7dece33000 r--p 00000000 ca:01 1579 /usr/lib/locale/locale-archive 7f7dece33000-7f7decf38000 r-xp 00000000 ca:01 397452 /lib/x86_64-linux-gnu/libm-2.19.so 7f7decf38000-7f7ded137000 ---p 00105000 ca:01 397452 /lib/x86_64-linux-gnu/libm-2.19.so 7f7ded137000-7f7ded138000 r--p 00104000 ca:01 397452 /lib/x86_64-linux-gnu/libm-2.19.so 7f7ded138000-7f7ded139000 rw-p 00105000 ca:01 397452 /lib/x86_64-linux-gnu/libm-2.19.so 7f7ded139000-7f7ded151000 r-xp 00000000 ca:01 396086 /lib/x86_64-linux-gnu/libz.so.1.2.8 7f7ded151000-7f7ded350000 ---p 00018000 ca:01 396086 /lib/x86_64-linux-gnu/libz.so.1.2.8 7f7ded350000-7f7ded351000 r--p 00017000 ca:01 396086 /lib/x86_64-linux-gnu/libz.so.1.2.8 7f7ded351000-7f7ded352000 rw-p 00018000 ca:01 396086 /lib/x86_64-linux-gnu/libz.so.1.2.8 7f7ded352000-7f7ded354000 r-xp 00000000 ca:01 397459 /lib/x86_64-linux-gnu/libutil-2.19.so 7f7ded354000-7f7ded553000 ---p 00002000 ca:01 397459 /lib/x86_64-linux-gnu/libutil-2.19.so 7f7ded553000-7f7ded554000 r--p 00001000 ca:01 397459 /lib/x86_64-linux-gnu/libutil-2.19.so 7f7ded554000-7f7ded555000 rw-p 00002000 ca:01 397459 /lib/x86_64-linux-gnu/libutil-2.19.so 7f7ded555000-7f7ded558000 r-xp 00000000 ca:01 397462 /lib/x86_64-linux-gnu/libdl-2.19.so 7f7ded558000-7f7ded757000 ---p 00003000 ca:01 397462 /lib/x86_64-linux-gnu/libdl-2.19.so 7f7ded757000-7f7ded758000 r--p 00002000 ca:01 397462 /lib/x86_64-linux-gnu/libdl-2.19.so 7f7ded758000-7f7ded759000 rw-p 00003000 ca:01 397462 /lib/x86_64-linux-gnu/libdl-2.19.so 7f7ded759000-7f7ded914000 r-xp 00000000 ca:01 397468 /lib/x86_64-linux-gnu/libc-2.19.so 7f7ded914000-7f7dedb14000 ---p 001bb000 ca:01 397468 /lib/x86_64-linux-gnu/libc-2.19.so 7f7dedb14000-7f7dedb18000 r--p 001bb000 ca:01 397468 /lib/x86_64-linux-gnu/libc-2.19.so 7f7dedb18000-7f7dedb1a000 rw-p 001bf000 ca:01 397468 /lib/x86_64-linux-gnu/libc-2.19.so 7f7dedb1a000-7f7dedb1f000 rw-p 00000000 00:00 0 7f7dedb1f000-7f7dedb38000 r-xp 00000000 ca:01 397457 /lib/x86_64-linux-gnu/libpthread-2.19.so 7f7dedb38000-7f7dedd37000 ---p 00019000 ca:01 397457 /lib/x86_64-linux-gnu/libpthread-2.19.so 7f7dedd37000-7f7dedd38000 r--p 00018000 ca:01 397457 /lib/x86_64-linux-gnu/libpthread-2.19.so 7f7dedd38000-7f7dedd39000 rw-p 00019000 ca:01 397457 /lib/x86_64-linux-gnu/libpthread-2.19.so 7f7dedd39000-7f7dedd3d000 rw-p 00000000 00:00 0 7f7dedd3d000-7f7dedd60000 r-xp 00000000 ca:01 397465 /lib/x86_64-linux-gnu/ld-2.19.so 7f7dedd9d000-7f7dede5d000 rw-p 00000000 00:00 0 7f7dede8e000-7f7dedf53000 rw-p 00000000 00:00 0 7f7dedf5d000-7f7dedf5f000 rw-p 00000000 00:00 0 7f7dedf5f000-7f7dedf60000 r--p 00022000 ca:01 397465 /lib/x86_64-linux-gnu/ld-2.19.so 7f7dedf60000-7f7dedf61000 rw-p 00023000 ca:01 397465 /lib/x86_64-linux-gnu/ld-2.19.so 7f7dedf61000-7f7dedf62000 rw-p 00000000 00:00 0 7fffe9f17000-7fffea716000 rw-p 00000000 00:00 0 [stack] 7fffea7fe000-7fffea800000 r-xp 00000000 00:00 0 [vdso] ffffffffff600000-ffffffffff601000 r-xp 00000000 00:00 0 [vsyscall] ProcStatus: Name: python State: S (sleeping) Tgid: 3285 Ngid: 0 Pid: 3285 PPid: 1 TracerPid: 0 Uid: 1000 1000 1000 1000 Gid: 1000 1000 1000 1000 FDSize: 256 Groups: 4 20 24 25 27 29 30 44 46 102 1000 VmPeak: 103498220 kB VmSize: 49857712 kB VmLck: 0 kB VmPin: 0 kB VmHWM: 103457032 kB VmRSS: 49816600 kB VmData: 49805700 kB VmStk: 8192 kB VmExe: 2804 kB VmLib: 9992 kB VmPTE: 97428 kB VmSwap: 0 kB Threads: 1 SigQ: 0/1967906 SigPnd: 0000000000000000 ShdPnd: 0000000000000000 SigBlk: 0000000000000000 SigIgn: 0000000001001001 SigCgt: 0000000180000002 CapInh: 0000000000000000 CapPrm: 0000000000000000 CapEff: 0000000000000000 CapBnd: 0000001fffffffff Seccomp: 0 Cpus_allowed: ffffffff,ffffffff,ffffffff,ffffffff Cpus_allowed_list: 0-127 Mems_allowed: 00000000,00000003 Mems_allowed_list: 0-1 voluntary_ctxt_switches: 10870 nonvoluntary_ctxt_switches: 129749 Signal: 11 Uname: Linux 3.13.0-36-generic x86_64 UnreportableReason: Cannot determine path of python module cdec.sa.compile UserGroups: adm audio cdrom dialout dip floppy netdev plugdev sudo video _LogindSession: /user/1000.user/6.session CoreDump: base64

redpony commented 9 years ago

Usually these errors are due to non-utf8 characters in the input files, or sometimes out-of-bounds alignment points. Can you possibly share your parallel corpus?

andidol commented 9 years ago

All input files are in UTF-8. Is there a way to skip/ignore these out-of-bounds alignment points? Yes of course I can share.. it is quite large, what would be the best for you?