nextcloud / documentserver_community

Document server for onlyoffice
https://apps.nextcloud.com/apps/documentserver_community
131 stars 29 forks source link

Invalid opcode causes coredumps in x2t binary/libdoctrenderer.so #204

Open strugee opened 3 years ago

strugee commented 3 years ago

My dmesg is full of lines like this:

[514745.545354] traps: x2t[29912] trap invalid opcode ip:7f52cc7fd849 sp:7ffc20f15d98 error:0 in libdoctrenderer.so[7f52cbe52000+f9b000

Additionally systemd-coredump has been collecting a ton of coredumps like this:

           PID: 46377 (x2t)
           UID: 33 (www-data)
           GID: 33 (www-data)
        Signal: 4 (ILL)
     Timestamp: Wed 2021-03-10 13:50:39 PST (1min 58s ago)
  Command Line: ./x2t /tmp/oc_tmp_b2WgnQ-.xml
    Executable: /srv/http/nextcloud/apps/documentserver_community/3rdparty/onlyoffice/documentserver/server/FileConverter/bin/x2t
 Control Group: /system.slice/cron.service
          Unit: cron.service
         Slice: system.slice
       Boot ID: cc4ef387fd4a4c9080873c428ffd956c
    Machine ID: b1b415b31b4249ab96aaa496f982617d
      Hostname: steevie
       Storage: /var/lib/systemd/coredump/core.x2t.33.cc4ef387fd4a4c9080873c428ffd956c.46377.1615413039000000.lz4
       Message: Process 46377 (x2t) of user 33 dumped core.

                Stack trace of thread 46377:
                #0  0x00007fd985dff849 n/a (libdoctrenderer.so)
                #1  0x00007fd98561b072 n/a (libdoctrenderer.so)
                #2  0x00007fd98560ffb1 n/a (libdoctrenderer.so)
                #3  0x00007fd9855ddebe n/a (libdoctrenderer.so)
                #4  0x00007fd9855d4643 _ZN14NSDoctRenderer13CDoctrenderer7ExecuteERKSbIwSt11char_traitsIwESaIwEERS4_ (libdoctrenderer.so)
                #5  0x0000000000748550 n/a (x2t)
                #6  0x000000000076011b n/a (x2t)
                #7  0x0000000000767895 n/a (x2t)
                #8  0x000000000076af1d n/a (x2t)
                #9  0x000000000054034f n/a (x2t)
                #10 0x00007fd984ed809b __libc_start_main (libc.so.6)
                #11 0x0000000000745f93 n/a (x2t)

                Stack trace of thread 46414:
                #0  0x00007fd985208896 futex_abstimed_wait_cancelable (libpthread.so.0)
                #1  0x00007fd985208988 __new_sem_wait_slow (libpthread.so.0)
                #2  0x00007fd985dfe480 n/a (libdoctrenderer.so)
                #3  0x00007fd985df3579 n/a (libdoctrenderer.so)
                #4  0x00007fd985df4a7b n/a (libdoctrenderer.so)
                #5  0x00007fd985dff5bf n/a (libdoctrenderer.so)
                #6  0x00007fd9851fffa3 start_thread (libpthread.so.0)
                #7  0x00007fd984fad4cf __clone (libc.so.6)

There appear to be three coredumps generated every 5 minutes, which coincides with the frequency I run regular Nextcloud cronjobs on. My preview generation cronjob (which I recently turned on Office document support for) runs every 1 minute, but just in case it's related, here's the relevant section in config.php:

  'enabledPreviewProviders' => 
  array (
    0 => 'OC\\Preview\\PNG',
    1 => 'OC\\Preview\\JPEG',
    2 => 'OC\\Preview\\GIF',
    3 => 'OC\\Preview\\HEIC',
    4 => 'OC\\Preview\\BMP',
    5 => 'OC\\Preview\\XBitmap',
    6 => 'OC\\Preview\\MP3',
    7 => 'OC\\Preview\\TXT',
    8 => 'OC\\Preview\\MarkDown',
    9 => 'OC\\Preview\\Movie',
    10 => 'OC\\Preview\\MKV',
    11 => 'OC\\Preview\\MP4',
    12 => 'OC\\Preview\\AVI',
    13 => 'OC\\Preview\\PDF',
    14 => 'OC\\Preview\\Illustrator',
    15 => 'OC\\Preview\\Photoshop',
    16 => 'OC\\Preview\\TIFF',
    17 => 'OC\\Preview\\SVG',
    18 => 'OC\\Preview\\MSOffice2003',
    19 => 'OC\\Preview\\MSOffice2007',
    20 => 'OC\\Preview\\MSOffice',
    21 => 'OC\\Preview\\MSOfficeDoc',
    22 => 'OC\\Preview\\OpenDocument',
    23 => 'OC\\Preview\\StarOffice',
    24 => 'OC\\Preview\\Font',
  ),

Here's the output of lscpu:

Architecture:        x86_64
CPU op-mode(s):      32-bit, 64-bit
Byte Order:          Little Endian
Address sizes:       46 bits physical, 48 bits virtual
CPU(s):              2
On-line CPU(s) list: 0,1
Thread(s) per core:  1
Core(s) per socket:  2
Socket(s):           1
NUMA node(s):        1
Vendor ID:           GenuineIntel
CPU family:          6
Model:               62
Model name:          Intel(R) Pentium(R) CPU 1403 v2 @ 2.60GHz
Stepping:            4
CPU MHz:             2599.818
CPU max MHz:         2600.0000
CPU min MHz:         1200.0000
BogoMIPS:            5199.63
Virtualization:      VT-x
L1d cache:           32K
L1i cache:           32K
L2 cache:            256K
L3 cache:            6144K
NUMA node0 CPU(s):   0,1
Flags:               fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm cpuid_fault pti tpr_shadow vnmi flexpriority ept vpid fsgsbase smep erms xsaveopt dtherm arat pln pts

I'm on Nextcloud 20.0.8, documentserver_community 0.1.9, Debian buster. uname -a reports Linux steevie 4.19.0-14-amd64 #1 SMP Debian 4.19.171-2 (2021-01-30) x86_64 GNU/Linux.

tmijieux commented 3 years ago

I am currently facing the same problem , i ran objdump -h /var/www/html/custom_apps/documentserver_community/3rdparty/onlyoffice/documentserver/server/FileConverter/bin/libdoctrenderer.so to find out that the adress f9b000 is in the gcc_except_table so maybe this c++ lib had an exception and there is a bug in the binary generated by gcc for exception handling i will try to recompile the lib if possible (did not look into it yet) and see if the bug still happens

tmijieux commented 3 years ago

eventually i replaced the libdoctrenderer.so from the one found in the latest image of document server (6.2.0.123 as of today) and that exception (illegal instruction) went away, i did not do any serious check about api/abi compatibility though, in the end i did not succeed to have synchronisation back to nextcloud working even with that fix, since other errors showed up

FGIKCM commented 3 years ago

Still in NC 20.0.10 : May 21 19:42:01 intranet kernel: traps: x2t[31720] trap invalid opcode ip:6c0cc9584849 sp:778e45efc808 error:0 in libdoctrenderer.so[6c0cc8bd9000+f9b000] May 21 19:42:01 intranet kernel: grsec: From <MyIP>: Illegal instruction occurred at 00006c0cc9584849 in /var/www/nextcloud/apps/documentserver_community/3rdparty/onlyoffice/documentserver/server/FileConverter/bin/x2t[x2t:31720] uid/euid:33/33 gid/egid:33/33, parent /usr/bin/php7.3[php:31663] uid/euid:33/33 gid/egid:33/33

I'm not sure I would replace the file by the latest one, I'm afraid this could cause some edge effects elsewhere

And I agree, it looks like this can be triggered by launching the cron job with

sudo -u www-data /usr/bin/php -f /var/www/nextcloud/cron.php
Githopp192 commented 1 year ago

NC: 24.0.7 on RHEL 8.7 ===============

Dumps on almost every access to a onlyoffice doc:

systemd-coredump Process 592086 (x2t) of user 48 dumped core. CODE_FILE ../src/coredump/coredump.c CODE_FUNC submit_coredump CODE_LINE 822 COREDUMP_CGROUP /system.slice/php-fpm.service COREDUMP_CMDLINE ./x2t /tmp/oc_tmp_9lxP1Q-.xml COREDUMP_COMM x2t COREDUMP_CWD /var/www/html/nextcloud/apps/documentserver_community/3rdparty/onlyoffice/documentserver/server/FileConverter/bin COREDUMP_ENVIRON PWD=/var/www/html/nextcloud/apps/documentservercommunity/3rdparty/onlyoffice/documentserver/server/FileConverter/bin SHLVL=1 =./x2t COREDUMP_EXE /var/www/html/nextcloud/apps/documentserver_community/3rdparty/onlyoffice/documentserver/server/FileConverter/bin/x2t COREDUMP_GID 48 COREDUMP_HOSTNAME xxx.xxx.xxx COREDUMP_OPEN_FDS 0:pipe:[4956654] pos: 0 flags: 00 mnt_id: 14 1:pipe:[4956655] pos: 0 flags: 01 mnt_id: 14 2:pipe:[4956656] pos: 0 flags: 01 mnt_id: 14 4:/var/lib/sss/mc/passwd pos: 0 flags: 02100000 mnt_id: 379 7:socket:[4956652] pos: 0 flags: 02 mnt_id: 10 8:socket:[4956653] pos: 0 flags: 02 mnt_id: 10 10:socket:[4956588] pos: 0 flags: 02 mnt_id: 10 12:socket:[30482] pos: 0 flags: 02 mnt_id: 10 13:xxxxx COREDUMP_PROC_CGROUP 12:pids:/system.slice/php-fpm.service 11:hugetlb:/ 10:blkio:/ 9:freezer:/ 8:memory:/system.slice/php-fpm.service 7:cpuset:/ 6:devices:/system.slice/php-fpm.service 5:rdma:/ 4: cpu,cpuacct:/ 3:perf_event:/ 2:net_cls,net_prio:/ 1:name=systemd:/system.slice/php-fpm.service COREDUMP_PROC_LIMITS Limit Soft Limit Hard Limit Units Max cpu time unlimited unlimited seconds Max file size unlimited unlimited bytes Max data size 4294967296 4294967296 bytes Max stack size 8388608 unlimited bytes Max core file size 0 unlimited bytes Max resident set unlimited unlimited bytes Max processes 38970 38970 processes Max open files 1024 262144 files Max locked memory 65536 65536 bytes Max address space unlimited unlimited bytes Max file locks unlimited unlimited locks Max pending signals 38970 38970 signals Max msgqueue size 819200 819200 bytes Max nice priority 0 0 Max realtime priority 0 0 Max realtime timeout unlimited unlimited us COREDUMP_PROC_MAPS [keine Daten] COREDUMP_PROC_MOUNTINFO [keine Daten] COREDUMP_PROC_STATUS Name: x2t Umask: 0022 State: S (sleeping) Tgid: 592086 Ngid: 0 Pid: 592086 PPid: 592062 TracerPid: 0 Uid: 48 48 48 48 Gid: 48 48 48 48 FDSize: 64 Groups: 48 986 NStgid: 592086 NSpid: 592086 NSpgid: 1076 NSsid: 1076 VmPeak: 189848 kB VmSize: 186828 kB VmLck: 0 kB VmPin: 0 kB VmHWM: 42164 kB VmRSS: 38876 kB RssAnon: 15448 kB RssFile: 23428 kB RssShmem: 0 kB VmData: 15292 kB VmStk: 132 kB VmExe: 41916 kB VmLib: 43532 kB VmPTE: 308 kB VmSwap: 0 kB HugetlbPages: 0 kB CoreDumping: 1 THP_enabled: 1 Threads: 1 SigQ: 0/38970 SigPnd: 0000000000000000 ShdPnd: 0000000000000000 SigBlk: 0000000000000000 SigIgn: 0000000000001000 SigCgt: 0000000180000000 CapInh: 0000000000000000 CapPrm: 0000000000000000 CapEff: 0000000000000000 CapBnd: 000001ffffffffff CapAmb: 0000000000000000 NoNewPrivs: 0 Seccomp: 0 Speculation_Store_Bypass: vulnerable Cpus_allowed: ff Cpus_allowed_list: 0-7 Mems_allowed: 00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000, 00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000001 Mems_allowed_list: 0 voluntary_ctxt_switches: 3 nonvoluntary_ctxt_switches: 4 COREDUMP_RLIMIT 0 COREDUMP_ROOT / COREDUMP_SIGNAL 11 COREDUMP_SIGNAL_NAME SIGSEGV COREDUMP_SLICE system.slice COREDUMP_TIMESTAMP 1669807603000000 COREDUMP_UID 48 COREDUMP_UNIT php-fpm.service MESSAGE_ID fc2e22bc6ee647b6b90729ab34a250b1 PRIORITY 2 SYSLOG_IDENTIFIER systemd-coredump _BOOT_ID 89de6b3a10f145be8e3c34676bd42c48 _CAP_EFFECTIVE 1fff7fcffff _CMDLINE /usr/lib/systemd/systemd-coredump _COMM systemd-coredum _EXE /usr/lib/systemd/systemd-coredump _GID 0 _HOSTNAME xxx.xxx.xxx _MACHINE_ID 43adbc81541c40b89dacdaed716213d2 _PID 592088 _SOURCE_REALTIME_TIMESTAMP 1669807603727743 _TRANSPORT journal _UID 0 CURSOR s=0c6773b7e0374f0f80e90c1e26940f77;i=3dac0;b=89de6b3a10f145be8e3c34676bd42c48;m=4468f99306;t=5eeae61103f28;x=1a255862a01e76c3 MONOTONIC_TIMESTAMP 293818962694 __REALTIME_TIMESTAMP 1669807603728168

github-actions[bot] commented 1 month ago

This issue has been marked 'stale' due to lack of recent activity. If there is no further activity, the issue will be closed in another 60 days. Thank you for your contribution!

Retaking repo management include inheritance of old, not treated, and probablty obsolete issues, this is why it was decided to mark issues as stale.