systemd / python-systemd

Python wrappers for systemd functionality
GNU Lesser General Public License v2.1
465 stars 66 forks source link

systemd.journal.seek_tail() doesn't seem to #98

Closed iam-TJ closed 2 years ago

iam-TJ commented 3 years ago

Whilst building a simple script to monitor a Distributed Replicated Block Device (DRBD) connection so we can trigger actions based on the link dropping and reconnecting we believe we've found a bug. We use a select.poll() to monitor the journal waiting for a specific static kernel log message from the drbd module. poll() returns but when we call systemd.journal.get_next() it returns an empty dict object. Added basic debug print into systemd/journal.py::get_next()

        r = super(Reader, self)._next(skip)
        print('Reader._next() returned ', r)
        if r:
            entry = super(Reader, self)._get_all()
            if entry:
                entry['__REALTIME_TIMESTAMP'] = self._get_realtime()
                entry['__MONOTONIC_TIMESTAMP'] = self._get_monotonic()
                entry['__CURSOR'] = self._get_cursor()
                return self._convert_entry(entry)
        return dict()

and we see:

(3, 1)]
Reader._next() returned  False
{}

We're using the example code from https://www.freedesktop.org/software/systemd/python-systemd/journal.html#example-polling-for-journal-events

Our code is:

#!/usr/bin/env python3
import sys
import select
from systemd import journal
j = journal.Reader()
j.seek_tail()
j.add_match('MESSAGE=' + sys.argv[1])
p = select.poll()
p.register(j, j.get_events())
print(p.poll(None))
print(j.get_next())

Called with:

# /etc/drbd.d/python3-monitor-connection.py "peer( Primary -> Unknown ) conn( Connected -> NetworkFailure ) pdsk( UpToDate -> DUnknown )"
iam-TJ commented 3 years ago

Further investigation seems to show that journal.seek_tail() isn't seeking to the end of the journal at all. If we remove the journal.add_match(MESSAGE='...') but have _TRANSPORT='kernel', on repeated invocations we see the same messages from 5 days ago being reported.

iam-TJ commented 3 years ago

Another user has used our script to independently confirm this same problem.

iam-TJ commented 3 years ago

Appears this may be a long-standing issue in libsystemd.

https://github.com/systemd/systemd/issues/17662

And mentioned in this example gist code:

https://gist.github.com/portante/ff7fb429c6f973aab377f7bb77b0ffdb#file-example_journal_reader-c-L22

...
  // For some reason, fetching previous then next avoids getting an initial
  // huge chunk of records.  If one calls sd_journal_seek_tail() and then
  // calls sd_journal_next() we receive the huge chunk.
  r = sd_journal_previous(j);
  if (r < 0) {
    fprintf(stderr, "Failed to get previous journal entry: %s\n", strerror(-r));
    return 1;
  }

  r = sd_journal_next(j);
...
keszybz commented 2 years ago

Yeah, python-systemd is just a wrapper around the libsystemd code. It does do some extra stuff, e.g. conversions, but it doesn't change fundamental things like how seeking and entry retrieval works. So if there's something to fix here, it needs to be handled on the libsystemd side. Let's track this in the other bug.