sensson / puppet-powerdns

A Puppet module to install and configure the PowerDNS 4.x authorative server and recursor
12 stars 45 forks source link

pdns_record keeps looping #169

Open maxadamo opened 5 months ago

maxadamo commented 5 months ago

My backend is sqlite3.

I have this entry to create a zone:

powerdns_zone { 'dumb.zone':; }

And I tried all of these, with the same result: it keeps adding the record again and again. although the record is present and it's working.

powerdns_record { 'dumb-record':
  target_zone => 'dumb.zone',
  rname       => 'dumb-record',
  rtype       => 'A',
  rcontent    => '192.168.1.10',
}
powerdns_record { 'dumb-record.dumb.zone':
  rname    => 'dumb-record',
  rtype    => 'A',
  rcontent => '192.168.1.10',
}
powerdns_record { 'dumb-record.dumb.zone':
  rcontent => '192.168.1.10',
}

Setting manage_record to false works, but I don't know if this is the intended:

powerdns_zone { 'dumb.zone':
  manage_records => false,
}

am I doing something wrong?

ju5t commented 5 months ago

@trefzer do you know why this can happen?

xyjonas commented 4 months ago

We observed something similar. When listing the Zone, pdnsutil printed some extra output like Apr 24 15:08:33 gsqlite3: connection to '/var/lib/powerdns/powerdns.sqlite3' successful in every run. The date and time were different every time. We fixed it by lowering/deleting the loglevel setting in /etc/pdns/pdns.conf, which was set to 6 before.

Longsight commented 4 months ago

I've had similar issues with manage_records => true and have pushed a fix for it at #171. What I was seeing on dumping the output of should_content was that content.push("$ORIGIN .\n"), presumably so written to ensure a \n at the end of the content, was then being sorted on the next line, resulting in:

$ORIGIN .

<begin records>

which never matched the output of pdnsutil, which instead produced output like:

$ORIGIN .
<begin records>

After applying this change locally, I'm no longer seeing any zones re-imported unless their records actually differ.

Longsight commented 4 months ago

Okay now I can't trigger the bug with the old code, so something else has changed.

Longsight commented 4 months ago

In your case it's more likely that there is some log output or something else extraneous in the pdnsutil output; while I don't think my change is actually incorrect, I don't think the extraneous \n is causing any problems. I suspect I fixed a few things at once and \n was never actually causing a re-import.

ju5t commented 4 months ago

I'm curious if this is related to a PowerDNS change or if it's an actual bug in the code that has never worked in the past.

We're not using it ourselves so it's difficult for me to verify this.

Longsight commented 4 months ago

Okay, so, step back and some clarity on the more generalised issue:

We're currently running PowerDNS 4.8.3 with the PostgreSQL backend.

Examining the PowerDNS database, it appears that PowerDNS never stores records internally with a trailing ., for record types that expect one (PTR, CNAME etc). When queried as normal, these records are returned with the trailing ., but they are stored without it, whether they're imported with it or not.

powerdns=# select * from records where content like '%bck%';
    id    | domain_id |                         name                         | type  |           content            | ttl | prio | disabled | ordername | auth 
----------+-----------+------------------------------------------------------+-------+------------------------------+-----+------+----------+-----------+------
 21551016 |         2 | bck01.lab.example.com                                | CNAME | vl7.bck01.lab.example.com    | 300 |    0 | f        |           | t
 21551826 |        13 | 8.0.30.172.in-addr.arpa                              | PTR   | bck01.lab.example.com        | 300 |    0 | f        |           | t
(2 rows)

When fetching zones with pdnsutil list-zone, PowerDNS is inconsistent about whether the trailing . is added or not. For example:

root@dns01:~# pdnsutil list-zone lab.example.com | grep bck
bck01.lab.example.com        300     IN      CNAME   vl7.bck01.lab.example.com.
vl7.bck01.lab.example.com    300     IN      A       172.30.0.8

The CNAME record has a trailing ., whereas:

root@dns01:~# pdnsutil list-zone 0.30.172.in-addr.arpa | grep bck
8.0.30.172.in-addr.arpa 300     IN      PTR     bck01.lab.example.com

The PTR record does not.

The same confusion affects SOA records:

root@dns21:~# dig -t SOA lab.example.com +short
dns.lab.example.com. admin.example.com. 22973 10800 3600 604800 3600
root@dns20:~# pdnsutil list-zone lab.example.com | grep SOA
lab.example.com      3600    IN      SOA     dns.lab.example.com admin.example.com 22973 10800 3600 604800 3600

Though the record is correctly queried with trailing . characters on the mname and rname values, they're missing from the database entry and the pdnsutil list-zone output.

As a separate issue, PowerDNS records that are imported with uppercase characters in the name field are automatically downcased before being saved into the database, and thus always returned as lowercase by pdnsutil list-zone, even if they were uppercase in the actual powerdns_record resource.

Since pdnsutil list-zone is the means by which the provider determines whether it needs to update the resource, this means that powerdns_zone and powerdns_record resources need to be written with respect to these quirks to ensure that should_content behaves correctly. Thus:

Any resource that fails these rules will still be accepted by PowerDNS and will still work just fine, but any affected zones will be marked as requiring correction by Puppet on every run, because the pdnsutil list-zone output will never match the resource parameters.

Since PowerDNS' behaviour is inconsistent across different record types, it doesn't necessarily feel correct to enforce these rules in the resources themselves, but failing to be aware of them means that people are going to end up with lots of endlessly repeated resources that change nothing but the zone serial.


The other thing that can happen, as mentioned earlier, is that setting loglevel above 5 in PowerDNS causes database connection messages to be written to STDERR on pdnsutil list-zone.

Unfortunately, the command execution context defined by the commands setter in provider.rb forces :combine => true, which combines STDOUT and STDERR in the command output and includes these database log lines in the output of pdnsutil(pdnsutil_options, 'list-zone', resource[:name]). Fixing this at the provider level would probably require overriding Puppet::Provider::CommandDefiner, and I don't know off the top of my head how easy (or possible) that's going to be, so it's vital when using this module that PowerDNS loglevel be kept at 5 or below.

I don't know if the behaviour of pdnsutil list-zones is different when using other backends (SQLite etc), but this is the behaviour I've observed with PostgreSQL.

Longsight commented 4 months ago

https://doc.powerdns.com/md/types/ says:

Warning: Host names and the MNAME of a SOA records are NEVER terminated with a '.' in PowerDNS storage! If a trailing '.' is present it will inevitably cause problems, problems that may be hard to debug. Use pdnsutil check-zone to validate your zone data.

From the PowerDNS source:

static int listZone(const DNSName &zone) {
  UeberBackend B;
  DomainInfo di;

  if (! B.getDomainInfo(zone, di)) {
    cerr << "Zone '" << zone << "' not found!" << endl;
    return EXIT_FAILURE;
  }
  di.backend->list(zone, di.id);
  DNSResourceRecord rr;
  cout<<"$ORIGIN ."<<endl;
  cout.sync_with_stdio(false);

  while(di.backend->get(rr)) {
    if(rr.qtype.getCode()) {
      if ( (rr.qtype.getCode() == QType::NS || rr.qtype.getCode() == QType::SRV || rr.qtype.getCode() == QType::MX || rr.qtype.getCode() == QType::CNAME) && !rr.content.empty() && rr.content[rr.content.size()-1] != '.')
    rr.content.append(1, '.');

      cout<<rr.qname<<"\t"<<rr.ttl<<"\tIN\t"<<rr.qtype.toString()<<"\t"<<rr.content<<"\n";
    }
  }
  cout.flush();
  return EXIT_SUCCESS;
}

Which suggests that the record types that require a trailing . in the rcontent parameter are NS, SRV, MX and CNAME. Quite why pdnsutil list-zone chooses to apply this logic is left as an exercise to the reader. This means that it may be preferable to force the absence of a trailing . in the rcontent parameter for most record types, and then add it to the record when testing against pdnsutil list-zones for these four record types, which would enforce consistency with PowerDNS behaviour in this case.

ju5t commented 4 months ago

This means that it may be preferable to force the absence of a trailing . in the rcontent parameter for most record types, and then add it to the record when testing against pdnsutil list-zones for these four record types, which would enforce consistency with PowerDNS behaviour in this case.

@Longsight first of all, thanks for the thorough explanation! I agree this would be the best way forward for now.

I'm not sure yet if this will solve the issue @maxadamo is facing as he had an issue with A-records, but it's definitely something that needs fixing. Considering the amount of time you already spent on this I'm a little hesitant to ask, but is it something you could PR you think?

Ps. I can confirm the behaviour of pdnsutil is exactly the same with a MySQL database.

Longsight commented 4 months ago

I'll refresh my memory of the type/provider API and have a look at this tonight, yeah.

Longsight commented 4 months ago

I'm not sure yet if this will solve the issue @maxadamo is facing as he had an issue with A-records

Without further investigation I can't say for sure, but if @maxadamo has loglevel set to 6 or more then they will 100% be experiencing the issue described. The most sensible thing would be to cull log lines from the command output in def content in provider/powerdns_zone_private/pdnsutil.rb; I'll include that in the PR.

maxadamo commented 4 months ago

I'm not sure yet if this will solve the issue @maxadamo is facing as he had an issue with A-records

Without further investigation I can't say for sure, but if @maxadamo has loglevel set to 6 or more then they will 100% be experiencing the issue described.

@Longsight I'm using the default loglevel. I'm not understanding what is the default, but the log files are pretty much empty. Only few lines are being written to logs upon service restart.

Longsight commented 4 months ago

@maxadamo what do you get when you run pdnsutil list-zone dumb.zone? Anything unexpected in the output?

Longsight commented 4 months ago

Essentially this is a question of comparing the output of that command with the expected output that the provider builds from the resources - if there are unexpected lines in the output, then it'll never match and always refresh the resource.

maxadamo commented 4 months ago

@Longsight there you go. Why do I see bindbackend?

# pdnsutil list-zone dumb.zone
Apr 26 17:24:29 [bindbackend] Done parsing domains, 0 rejected, 0 new, 0 removed
$ORIGIN .
dumb-record.dumb.zone   3600    IN  A   192.168.1.10
dumb.zone   3600    IN  SOA a.powerdns.server hostmaster 4 10800 3600 604800 3600

but, why do I see bindbackend, if I declared sqlite only?

  class { 'powerdns':
    recursor_version => $pdns_major_version,
    recursor         => true,
    backend          => 'sqlite';
  }
maxadamo commented 4 months ago

for me it was solved removing the package pdns-backend-bind At some point this package got installed, and I don't know why. Perhaps I initially declared the class without specifying sqlite, but it seems to work, and it's not being reinstalled by the module. For the sake of sharing, pdns_record now works and this is the output of the tool:

# pdnsutil list-zone dumb.zone
$ORIGIN .
dumb-record.dumb.zone   3600    IN  A   192.168.1.10
dumb.zone   3600    IN  SOA a.powerdns.server hostmaster 4 10800 3600 604800 3600
maxadamo commented 4 months ago

would it be an option to uninstall backends that have not been declared in the module?

Longsight commented 4 months ago

It's automatically purged if the chosen backend is postgresql or ldap, it should probably be purged for all other backends as well since it's pulled in as a recommended package by pdns-server on Debian.

The default backend in the module is mysql so if you don't specify one at all it won't default to bind.

stale[bot] commented 5 days ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. We get that this might not be what you had expected but we're trying to limit stale as issues as much as possible. Thank you for your contributions.