kiwix / kiwix-tools

Command line Kiwix tools: kiwix-serve, kiwix-manage, ...
https://download.kiwix.org/release/kiwix-tools/
GNU General Public License v3.0
435 stars 85 forks source link

kiwix-serve segmentation fault #341

Closed tim-moody closed 4 years ago

tim-moody commented 4 years ago

This happens intermittently on Raspberry Pi 4 with 2G RAM running 2019-09-26 raspbian lite and kiwix-serve 3.0.1-6 using the following systemd unit file:


[Unit]
Description=Provides the Kiwix ZIM File Server
After=syslog.target network.target local-fs.target

[Service]
Type=forking
ExecStart=/opt/iiab/kiwix/bin/kiwix-serve --daemon --port 3000 --nolibrarybutton --library /library/zims/library.xml --urlRootLocation=/kiwix/
Restart=on-abort
RestartSec=5s

[Install]
WantedBy=multi-user.target
tim-moody commented 4 years ago

The log shows:


-- A start job for unit kiwix-serve.service has begun execution.
--
-- The job identifier is 2362.
Oct 08 21:16:11 box.lan systemd[1]: kiwix-serve.service: Control process exited, code=killed, status=11/SEGV
-- Subject: Unit process exited
-- Defined-By: systemd
-- Support: https://www.debian.org/support
--
-- An ExecStart= process belonging to unit kiwix-serve.service has exited.
--
-- The process' exit code is 'killed' and its exit status is 11.
Oct 08 21:16:11 box.lan systemd[1]: kiwix-serve.service: Failed with result 'signal'.
-- Subject: Unit failed
-- Defined-By: systemd
-- Support: https://www.debian.org/support
--
-- The unit kiwix-serve.service has entered the 'failed' state with result 'signal'.
Oct 08 21:16:11 box.lan systemd[1]: Failed to start Provides the Kiwix ZIM File Server.
-- Subject: A start job for unit kiwix-serve.service has failed
-- Defined-By: systemd
-- Support: https://www.debian.org/support
--
-- A start job for unit kiwix-serve.service has finished with a failure.
--
-- The job identifier is 2362 and the job result is failed.
Oct 08 21:16:16 box.lan systemd[1]: kiwix-serve.service: Service RestartSec=5s expired, scheduling restart.
Oct 08 21:16:16 box.lan systemd[1]: kiwix-serve.service: Scheduled restart job, restart counter is at 1.
-- Subject: Automatic restarting of a unit has been scheduled
-- Defined-By: systemd
-- Support: https://www.debian.org/support
--
-- Automatic restarting of the unit kiwix-serve.service has been scheduled, as the result for
-- the configured Restart= setting for the unit.
Oct 08 21:16:16 box.lan systemd[1]: Stopped Provides the Kiwix ZIM File Server.
-- Subject: A stop job for unit kiwix-serve.service has finished
-- Defined-By: systemd
-- Support: https://www.debian.org/support
--
-- A stop job for unit kiwix-serve.service has finished.
--
-- The job identifier is 2420 and the job result is done.
Oct 08 21:16:16 box.lan systemd[1]: Starting Provides the Kiwix ZIM File Server...
-- Subject: A start job for unit kiwix-serve.service has begun execution
-- Defined-By: systemd
-- Support: https://www.debian.org/support
--
-- A start job for unit kiwix-serve.service has begun execution.
--
-- The job identifier is 2420.
Oct 08 21:16:16 box.lan systemd[1]: Started Provides the Kiwix ZIM File Server.
-- Subject: A start job for unit kiwix-serve.service has finished successfully
-- Defined-By: systemd
-- Support: https://www.debian.org/support
--
-- A start job for unit kiwix-serve.service has finished successfully.
--
-- The job identifier is 2420.
root@raspberrypi:/opt/iiab/iiab# systemctl status kiwix-serve
● kiwix-serve.service - Provides the Kiwix ZIM File Server
   Loaded: loaded (/etc/systemd/system/kiwix-serve.service; enabled; vendor preset: enabled)
   Active: active  since Tue 2019-10-08 21:16:16 BST; 1min 21s ago
  Process: 32088 ExecStart=/opt/iiab/kiwix/bin/kiwix-serve --daemon --port 3000 --nolibrarybutton --library /library/zims/library.xml --urlRootLocation=/kiwix/ (code=exited, status=0/SUCCESS
 Main PID: 32089 (kiwix-serve)
    Tasks: 5 (limit: 4035)
   Memory: 1.6M
   CGroup: /system.slice/kiwix-serve.service
           └─32089 /opt/iiab/kiwix/bin/kiwix-serve --daemon --port 3000 --nolibrarybutton --library /library/zims/library.xml --urlRootLocation=/kiwix/
kelson42 commented 4 years ago

@tim-moody Thank you for your bug report, but it will hard to do anything here without more information (the log you provide is not coming from kiwix-serve).

What would be helpful is:

holta commented 4 years ago
  • the version of kiwix-serve
  • secure that the bug still occur with version 3.0.1-6

FWIW I can confirm this happened to 2 different people running kiwix-serve 3.0.1-6 on Raspberry Pi over the past 10 days.

tim-moody commented 4 years ago

@kelson42 the version of kiwix-serve was in the first sentence along with the rest of the environment.

tim-moody commented 4 years ago

@kelson42

root@box:~# /opt/iiab/kiwix/bin/kiwix-serve --verbose --port 3000 --nolibrarybutton --library /library/zims/library.xml --urlRootLocation=/kiwix/ Segmentation fault

Where do I see any output? there is nothing in the log

I ran a second time and it succeeded.

kelson42 commented 4 years ago

@tim-moody So it crash sporadically at start? If you have no log, then this is at he really beginning. Could you upload the core dump please? How looks the library.xml.

tim-moody commented 4 years ago

@kelson42 library.xml has a single test.zim that has been used with iiab for a long time. Where would the core dump file be?

tim-moody commented 4 years ago

OK looks like core dumps are turned off on rpi by default

tim-moody commented 4 years ago

kiwix-serve-dump.zip

kelson42 commented 4 years ago

@tim-moody thx very much, this ticket will be investigated in priority in January when @mgautierfr will be back

holta commented 4 years ago

@kelson42 & @mgautierfr: thanks much for investigating when you can, as this seg-fault bug in 3.0.1-6 is more serious than I earlier realized, apparently blocking many Internet-in-a-Box installs at the moment.

In any case a 3rd machine is now affected...in this case a Raspberry Pi 4 with 1GB RAM, running Raspbian Lite 2019-09-26...which failed in the same way when IIAB was being installed using http://d.iiab.io :

"Unable to start service kiwix-serve: Job for kiwix-serve.service failed because a fatal signal was delivered to the control process."

FYI re-running IIAB's kiwix-tools installation worked on the 3rd try, similar to @tim-moody's experience.

QUESTION: if IIAB reverts to a prior version of kiwix-tools, would you recommend any particular version as especially safe overall?

kelson42 commented 4 years ago

@holta You can try to rollback to an earlier version, but I can not give you any particular advice here. We run this version as well and don't have the same kind of problem like you have. The fact that this is sporadic makes it even more dubious. So I wonder if this is related to the ARM/32bits architecture or a specific (set of) ZIM files. Difficult to say for me.

holta commented 4 years ago

I wonder if this is related to the ARM/32bits architecture

Possible. At least 2 of the 3 failures occured on ARMv8 Raspberry Pi 4 fwiw.

or a specific (set of) ZIM files. Difficult to say for me.

Unlikely. All 3 installs that failed contained only 1 ZIM file — as we've been using for many years:

https://github.com/iiab/iiab/blob/master/roles/kiwix/files/test.zim

Is there any reason this 52.1 KB ZIM file from many years ago (test.zim obtained from Kiwix) would no longer be adequate?

holta commented 4 years ago

A 4th machine was just affected (this time an original Raspberry Pi 3).

So it turns out this issue is affecting many, many people trying to install Internet-in-a-Box.

This latest failure to install has its install logs here FWIW, for the coming week: http://paste.debian.net/1105648/

tim-moody commented 4 years ago

For me this fault happens once in every 3 to 4 starts valgrind reports

==5594== Syscall param set_robust_list(head) points to uninitialised byte(s)
==5594==    at 0x4EA690: ??? (in /opt/iiab/kiwix/bin/kiwix-serve)
==5594==  Address 0x4000078 is not stack'd, malloc'd or (recently) free'd
==5594==
==5594== Conditional jump or move depends on uninitialised value(s)
==5594==    at 0x50066C: ??? (in /opt/iiab/kiwix/bin/kiwix-serve)
==5594==
==5594== Conditional jump or move depends on uninitialised value(s)
==5594==    at 0x59DEA0: ??? (in /opt/iiab/kiwix/bin/kiwix-serve)
==5594==
==5594== Thread 2:
==5594== Conditional jump or move depends on uninitialised value(s)
==5594==    at 0x4E5D70: ??? (in /opt/iiab/kiwix/bin/kiwix-serve)
==5594==
==5594== Thread 1:
==5594== Conditional jump or move depends on uninitialised value(s)
==5594==    at 0x5B4770: ??? (in /opt/iiab/kiwix/bin/kiwix-serve)
==5594==
==5594== Conditional jump or move depends on uninitialised value(s)
==5594==    at 0x5B4788: ??? (in /opt/iiab/kiwix/bin/kiwix-serve)
==5594==
==5594== Conditional jump or move depends on uninitialised value(s)
==5594==    at 0xFFFF0FCC: ???
==5594==
==5594== Conditional jump or move depends on uninitialised value(s)
==5594==    at 0xFFFF0FD4: ???
==5594==
==5594== Conditional jump or move depends on uninitialised value(s)
==5594==    at 0x5B47A8: ??? (in /opt/iiab/kiwix/bin/kiwix-serve)
==5594==
==5594== Conditional jump or move depends on uninitialised value(s)
==5594==    at 0x5B47B4: ??? (in /opt/iiab/kiwix/bin/kiwix-serve)
==5594==
==5594== Conditional jump or move depends on uninitialised value(s)
==5594==    at 0x5B4818: ??? (in /opt/iiab/kiwix/bin/kiwix-serve)
==5594==
==5594== Conditional jump or move depends on uninitialised value(s)
==5594==    at 0x5B4848: ??? (in /opt/iiab/kiwix/bin/kiwix-serve)
==5594==
==5594== Conditional jump or move depends on uninitialised value(s)
==5594==    at 0x5B4868: ??? (in /opt/iiab/kiwix/bin/kiwix-serve)
==5594==
==5594== Conditional jump or move depends on uninitialised value(s)
==5594==    at 0x5B4874: ??? (in /opt/iiab/kiwix/bin/kiwix-serve)
==5594==
holta commented 4 years ago

For me this fault happens once in every 3 to 4 starts

Odd because this now seems to block almost every Internet-in-a-Box install on Raspberry Pi. (A 5th such failure occurred just an hour ago.)

Thank you to @jvonau & @tim-moody who are both trying hard to investigate this. One promising theory: the issue is greatly exacerbated by Raspbian's latest OS of "2019-09-26" [that was actually published around 2019-10-01] and/or by several apt updates to Raspbian since then.

(I do not understand it, but the severity of this bug is much worse this week [affecting almost every IIAB install onto Raspberry Pi] as compared to last week, when that did not seem to be the case!)

kelson42 commented 4 years ago

@rgaudin Have you been able to see similar behaviour with Kiwix Hotpost?

holta commented 4 years ago

Thanks again to @tim-moody and @jvonau who are running many tests that (seem) to reconfirm this is arising from the new Raspbian 2019-09-26 or a recent apt update package/libray/dependency since then.

Some of the ideas they've worked though summarized @ https://github.com/iiab/iiab/issues/1993#issuecomment-540785615

holta commented 4 years ago

FYI @tim-moody confirmed the problem occurs with Raspbian 2019-09-26 (that was actually published October ~1st).

Whereas he CANNOT reproduce the problem when using the prior version of Raspbian (from 2019-07-10, which explains why we've been only been seeing this bug for ~9 days now...)

PS Tim asks if libxml2 or libss2 might possibly be involved? Even if Raspbian's 2 patches below (since the 2019-09-26 OS's release) have NOT solved the problem unfortunately...i.e. applying these 2 patches below don't solve this issue FYI :/

libxml2/stable 2.9.4+dfsg1-7+b3 armhf [upgradable from: 2.9.4+dfsg1-7+b1] libss2/stable 1.44.5-1+deb10u2 armhf [upgradable from: 1.44.5-1+deb10u1]

holta commented 4 years ago

Underlying cause might be this upstream Raspbian issue: raspberrypi/linux#3271 (thx @tim-moody)

tim-moody commented 4 years ago

./kiwix-manage /library/zims/library.xml show

also gives seg fault with same library.xml already attached

kelson42 commented 4 years ago

@tim-moody This seems to confirm that this is a global compilation or runtime environment problem.

tim-moody commented 4 years ago

tried same command with newly built library.xml from wikipedia_en_mathematics_mini_2019-08.zim

get a seg fault about half the time

tim-moody commented 4 years ago

@kelson42, possibly. I reproduced it on 2019-09-26 both before and after updates, but not on 2019-07-10-raspbian-buster-lite.

holta commented 4 years ago

The fix is in — running rpi-update to get pre-release kernel 4.19.79-v7l+ (instead of 4.19.75-v7l+ released yesterday and again today!) somehow does the trick: https://github.com/raspberrypi/linux/issues/3271#issuecomment-541250165

But it might take a few days to be published/mainlined by the Raspberry Pi Foundation, so that everyone will be able to fix things by running...

apt update && apt -y dist-upgrade && reboot
kelson42 commented 4 years ago

@holta Does someone has check the fix with kiwix serve?

holta commented 4 years ago

@holta Does someone has check the fix with kiwix serve?

Yes. As mentioned in link I included above.

kelson42 commented 4 years ago

@holta So, we just need to wait a few days to get the official image of Raspbian updated and get that ticket closed?

tim-moody commented 4 years ago

I agree. I ran rpi-update and then /opt/iiab/kiwix/bin/kiwix-manage /library/zims/library.xml show 20 times with no crash. previously crashed 1 in 3 times.

kelson42 commented 4 years ago

@tim-moody Good, you have written that even back in June is was not perfectly robust at start. If this is still the case after this last fix, please open a separate ticket so we get a chance to understand the problem.