Closed santiagopoli closed 6 years ago
Do you remember since what version you started to get crashes?
What version of expat is ejabberd compiled against?
We are using expat 2.1.0. 2.1.0-6+deb8u3 to be exact
dpkg -s expat
Package: expat
Status: install ok installed
Priority: optional
Section: text
Installed-Size: 42
Maintainer: Laszlo Boszormenyi (GCS) <gcs@debian.org>
Architecture: amd64
Version: 2.1.0-6+deb8u3
Depends: libc6 (>= 2.14), libexpat1 (>= 2.0.1)
Description: XML parsing C library - example application
This package contains xmlwf, an example application of expat, the C
library for parsing XML. The arguments to xmlwf are one or more
files which are each to be checked for XML well-formedness.
Homepage: http://expat.sourceforge.net
dpkg -s libexpat1
Package: libexpat1
Status: install ok installed
Priority: optional
Section: libs
Installed-Size: 347
Maintainer: Laszlo Boszormenyi (GCS) <gcs@debian.org>
Architecture: amd64
Multi-Arch: same
Source: expat
Version: 2.1.0-6+deb8u3
Depends: libc6 (>= 2.14)
Pre-Depends: multiarch-support
Conflicts: wink (<= 1.5.1060-4)
Description: XML parsing C library - runtime library
This package contains the runtime, shared library of expat, the C
library for parsing XML. Expat is a stream-oriented parser in
which an application registers handlers for things the parser
might find in the XML document (like start tags).
Homepage: http://expat.sourceforge.net
I have just found something. If I run
dpkg -s lib64expat1
dpkg-query: package 'lib64expat1' is not installed and no information is available
Use dpkg --info (= dpkg-deb --info) to examine archive files,
and dpkg --contents (= dpkg-deb --contents) to list their contents.
Can the problem be related to the fact we are not installing the amd64 version of expat? (We are using a 64 bit system)
That libexpat1 package is already 64 bit version
My bad, it says Architecture: amd64 right in the middle :(
@santiagopoli Do you remember since what version you started to get crashes?
No, but I got this same crash using Ejabberd 16.6. It didn't happen to us with Ejabberd 15 (but we were having a lot of other crashes -due to our code- in that time, so maybe it happened as well)
BTW, we're using Ejabberd as an Elixir dependency.
Could this be related to https://bugs.erlang.org/browse/ERL-304 ?
Yes, this is probably the same bug.
santiagopoli, did you resolve this crash? Recently we met the very similar crash issue as well.
Guys, we're reviewing our C code, please be patient (we got plenty of the cores like this one).
No, we haven't solved it yet, but we think it only happens within a cluster. We've tried to reproduce this bug with a single, larger node and we didn't have this error (but can be just pure luck). Having 4 interconnected nodes produces this crash 2+ times a day. Notice we often have 100k users connected at the same time.
Have you guys figured out the root cause? We've temporarily resolved this issue by rollback fast_xml to p1_xml. Hopeful this info could be a little useful for others.
@shanjianping yes, that's important info, thanks
I'm experiencing the same symptoms with this issue. If you don't mind me asking:
The root case is now covered by the test here: https://github.com/processone/fast_xml/commit/6cfd311c6fa6ed94de7b9bdb255751c26e66094b Without the patch you would get a segfault at line 406. Simply put, if you send more data after a server generates xml-too-big error, it would segfault, because a server attempts to reuse freed structures. This also happens in non-cluster environment, it's pretty much reproduceable: revert the patch and run the test.
Following trick is fixed my problem on my fresh ejabbard install on ubuntu 16.04.03
This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.
Hi,
I'm having constant crashes (every 3-6 hours on every node) in ejabberd. I'm currently using ejabberd 16.8 but the issue happened with older versions as well. Architecture-wise, I have four ejabberd nodes in a cluster. I'm also using redis as the session manager.
Fortunately, the crashes leave a core dump which I have analysed without luck.
Here is the output, step by step (I've highlighted each command, followed by its output):
gdb xavier/rel/xavier/erts-8.1.1/bin/beam.smp -core crashes/core.5_scheduler.14 -d otp_src_19.1/erts/emulator
(gdb) print ptr
(gdb) print gval
(gdb) source otp_src_19.1/erts/etc/unix/etp-commands.in
(gdb) etp-process-info p
(gdb) etp-stacktrace p
(gdb) etp-stackdump p
(gdb) etpf-stackdump p
TL;DR I assume the problem is happening in ejabberd_router_multicast (which may have to do with the intra-node communication) but I don't know how to proceed. The ejabberd_c2s. {'$gen_event',{xmlstreamerror,#HeapBinary<0x15,0x204c4d58,0x6920617a,0x6962206f>}}. line seems important.
Thanks in advance!