mysociety / alaveteli

Provide a Freedom of Information request system for your jurisdiction
https://alaveteli.org
Other
387 stars 195 forks source link

Xapian indexing stuck due to defunct wvText #5824

Open garethrees opened 4 years ago

garethrees commented 4 years ago

update-xapian-index has been stuck since midnight (9 hours).

Looks like its a problem due to [wvText] <defunct>:

gareth@owl: ~
$ ps aux | grep "^foi\s"
foi      10259  0.0  0.3 785136 200792 ?       Sl   00:08   0:14 /opt/rbenv/versions/2.6.6/bin/ruby bin/rails runner AlaveteliMailPoller.poll_for_incoming_loop
foi      11906 42.0  0.7 1269716 481660 ?      Sl   08:34  20:41 ruby /usr/share/passenger/helper-scripts/rack-loader.rb
foi      13345 28.5  0.7 1264440 484484 ?      Sl   08:35  14:02 ruby /usr/share/passenger/helper-scripts/rack-loader.rb
foi      13779 13.9  0.6 1198684 433432 ?      Sl   08:35   6:49 ruby /usr/share/passenger/helper-scripts/rack-loader.rb
foi      14218  5.0  0.5 1186520 356408 ?      Sl   08:35   2:27 ruby /usr/share/passenger/helper-scripts/rack-loader.rb
foi      14793  1.6  0.5 1120012 346952 ?      Sl   08:35   0:49 ruby /usr/share/passenger/helper-scripts/rack-loader.rb
foi      15406 10.3  0.7 1065920 495840 ?      Sl   Jul21 428:33 /opt/rbenv/versions/2.6.6/bin/ruby bin/rails runner TrackMailer.alert_tracks_loop
foi      15627  0.0  0.2 678292 167604 ?       Sl   Jul21   0:21 /opt/rbenv/versions/2.6.6/bin/ruby bin/rails runner NotificationMailer.send_notifications_loop
foi      16029  0.0  0.0   4276   724 ?        Ss   Jul23   0:00 /bin/sh -c /usr/bin/env XAPIAN_MAX_CHANGESETS=10 run-with-lockfile -n /data/vhost/www.whatdotheyknow.com/change-xapian-database.lock "/data/vhost/www.whatdotheyknow.com/alaveteli/script/update-xapian-index verbose=true" >> /data/vhost/www.whatdotheyknow.com/alaveteli/log/update-xapian-index.log || echo "stalled?"
foi      16073  0.0  0.0   4164   684 ?        S    Jul23   0:00 run-with-lockfile -n /data/vhost/www.whatdotheyknow.com/change-xapian-database.lock /data/vhost/www.whatdotheyknow.com/alaveteli/script/update-xapian-index verbose=true
foi      16142  0.0  0.0   4276   744 ?        S    Jul23   0:00 sh -c /data/vhost/www.whatdotheyknow.com/alaveteli/script/update-xapian-index verbose=true
foi      16156  0.0  0.0  33612  2900 ?        S    Jul23   0:00 /bin/bash /data/vhost/www.whatdotheyknow.com/alaveteli/script/update-xapian-index verbose=true
foi      16200 99.7  0.2 667872 140420 ?       Rl   Jul23 767:15 /data/vhost/www.whatdotheyknow.com/alaveteli-2020-07-21T11-25-34/vendor/bundle/ruby/2.6.0/bin/rake --silent xapian:update_index verbose=true
foi      16944  0.0  0.0      0     0 ?        Z    Jul23   0:00 [wvText] <defunct>
foi      18099  0.4  0.3 1061060 225432 ?      Sl   08:35   0:12 ruby /usr/share/passenger/helper-scripts/rack-loader.rb
gareth@owl: ~
$ sudo tail /var/www/www.whatdotheyknow.com/alaveteli/log/update-xapian-index.log.1
ActsAsXapian.update_index update InfoRequestEvent 7332705 2020-07-23 20:25:09 +0100
ActsAsXapian.update_index update InfoRequestEvent 7332706 2020-07-23 20:30:05 +0100
ActsAsXapian.update_index update InfoRequestEvent 7332707 2020-07-23 20:30:05 +0100
ActsAsXapian.update_index update InfoRequestEvent 7148079 2020-07-23 20:30:06 +0100
ActsAsXapian.update_index update InfoRequestEvent 7152682 2020-07-23 20:30:06 +0100
ActsAsXapian.update_index update InfoRequestEvent 7152715 2020-07-23 20:30:06 +0100
ActsAsXapian.update_index update InfoRequestEvent 7332708 2020-07-23 20:30:06 +0100
ActsAsXapian.update_index update User 70204 2020-07-23 20:35:07 +0100
ActsAsXapian.update_index update InfoRequestEvent 7332709 2020-07-23 20:35:07 +0100
ActsAsXapian.update_index update InfoRequestEvent 7332184 2020-07-23 20:35:07 +0100
gareth@owl: ~
$ sudo strace -s 1024 -p 16029
strace: Process 16029 attached
wait4(-1, ^Cstrace: Process 16029 detached
 <detached ...>
gareth@owl: ~
$ sudo strace -s 1024 -p 16944
strace: attach: ptrace(PTRACE_SEIZE, 16944): Operation not permitted

The last event in the log is a response containing some word docs:

e = InfoRequestEvent.find(7332184)
e.event_type
# => "response"
e.incoming_message.foi_attachments.map(&:content_type)
# => ["text/plain", "application/vnd.openxmlformats-officedocument.wordprocessingml.document", "application/vnd.ms-word"]
e.incoming_message.foi_attachments.pluck(:content_type)
# => ["text/plain", "application/vnd.openxmlformats-officedocument.wordprocessingml.document", "application/vnd.ms-word"]

At the moment, both HTML extractions of the attachments are just blank:

Screenshot 2020-07-24 at 09 30 13
garethrees commented 4 years ago

I expect what's happened is that wvText has been killed due to memory or timeout constraints but we haven't registered that in the parent process (see https://askubuntu.com/a/201308).

garethrees commented 4 years ago

For now I've just killed the Xapian processes.

I've also marked the event for indexing, so we'll see if it happens again or whether it's just a one off.

e = InfoRequestEvent.find(7332184)
e.xapian_mark_needs_index

Will reopen if it gets stuck again.

garethrees commented 4 years ago

This has happened again:

$ ps aux | grep "^foi\s" | grep -E "(xapian|defunct)"
foi      12202  0.0  0.0   4276   768 ?        Ss   16:25   0:00 /bin/sh -c /usr/bin/env XAPIAN_MAX_CHANGESETS=10 run-with-lockfile -n /data/vhost/www.whatdotheyknow.com/change-xapian-database.lock "/data/vhost/www.whatdotheyknow.com/alaveteli/script/update-xapian-index verbose=true" >> /data/vhost/www.whatdotheyknow.com/alaveteli/log/update-xapian-index.log || echo "stalled?"
foi      12233  0.0  0.0   4164   680 ?        S    16:25   0:00 run-with-lockfile -n /data/vhost/www.whatdotheyknow.com/change-xapian-database.lock /data/vhost/www.whatdotheyknow.com/alaveteli/script/update-xapian-index verbose=true
foi      12289  0.0  0.0   4276   712 ?        S    16:25   0:00 sh -c /data/vhost/www.whatdotheyknow.com/alaveteli/script/update-xapian-index verbose=true
foi      12327  0.0  0.0  33612  3056 ?        S    16:25   0:00 /bin/bash /data/vhost/www.whatdotheyknow.com/alaveteli/script/update-xapian-index verbose=true
foi      12362 99.7  0.2 668228 140932 ?       Rl   16:25  74:18 /data/vhost/www.whatdotheyknow.com/alaveteli-2020-07-27T11-52-07/vendor/bundle/ruby/2.6.0/bin/rake --silent xapian:update_index verbose=true
foi      13050  0.0  0.0      0     0 ?        Z    16:25   0:00 [wvText] <defunct>
garethrees commented 4 years ago

Happened again (~08:00 – 10:00); killed.