pan-net / terraform-provider-powerdns

Terraform PowerDNS provider
https://www.terraform.io/docs/providers/powerdns/
Mozilla Public License 2.0
44 stars 48 forks source link

Incremental 500 internal server errors #75

Closed dkowis closed 3 years ago

dkowis commented 3 years ago

For a powerdns install on 4.4 that I just did, Creating a fresh domain, I seem to have incremental 500s. It'll apply like one of the records, and then fail with 500s for the rest, then I run it again, and it gets one further.

The zones get created fine, but the record changes do not.

Terraform Version

Terraform v0.14.3
+ provider registry.terraform.io/pan-net/powerdns v1.4.0

Affected Resource(s)

powerdns_record

Terraform Configuration Files

Here's the entire terraform repo as it stands, with the domain entries I wanted to run. Haven't gotten very far with it yet. sample.zip

Debug Output

https://gist.github.com/dkowis/30b1a1b41936523ac65d201f9061adf3

Expected Behavior

I should've just been able to do a single terraform apply, and everything was awesome.

Actual Behavior

I had to incrementally apply the stuff. The server is a vanilla powerdns install, using a sqlite3 backend. Literally I have only configured the web API, and the sqlite3 backend.

asciicast

When deleting the state, I get errors, but things actually did get deleted, leaving me with an invalid state asciicast

Steps to Reproduce

all covered in the ascii cast.

mbag commented 3 years ago

Hi, thanks for reporting this.

in the log I see error reason to be Not Found i.e. the 404, but in the screencast I see Internal Server Error. Can you check the PowerDNS logs to see the reason for these errors?

And also what PowerDNS server version are you using?

dkowis commented 3 years ago

I don't know why I didn't check logs, probably too busy focusing on the actual terraform itself :( Derp.

PowerDNS Logs: https://gist.github.com/dkowis/96754a3f5a1cbcbaf3f6aaceb1a71a5e

Looks like the database is locked, and so the thing can't delete it. That's a surprising behavior to me. Perhaps the API is being hit too quickly? Is this a limitation of sqlite3? I don't remember reading anything in the sql backend details when picking this, but maybe it is.

I installed Powerdns from their upstream repo.powerdns.com :

root@powerdns:/var/log# dpkg -l | grep pdns
ii  pdns-backend-bind                    4.4.0-1pdns.focal                     amd64        BIND backend for PowerDNS
ii  pdns-backend-sqlite3                 4.4.0-1pdns.focal                     amd64        sqlite 3 backend for PowerDNS
ii  pdns-server                          4.4.0-1pdns.focal                     amd64        extremely powerful and versatile nameserver
mbag commented 3 years ago

Looks like the database is locked, and so the thing can't delete it. That's a surprising behavior to me. Perhaps the API is being hit too quickly? Is this a limitation of sqlite3?

:thinking: I'm using MariaDB container as database backend for testing (https://github.com/pan-net/terraform-provider-powerdns/blob/master/docker-compose.yml#L13) You can try using some other database instead of sqlite3 and see if problem persists.

Optionally, you can use -parallelism=1 option to limit number of concurrent operations. This might also help isolate if DB is the problem.

dkowis commented 3 years ago

Sure enough. -parallelism=1 did solve it completely. So it seems the API doesn't do synchronization, and the sqlitedb is annoyed by that.

Once I did that, everything worked great. Maybe worth mentioning in a readme, but it certainly seems to be a symptom of using sqlite. For me and my homelab, not doing updates in parallel is perfectly acceptable, and sqlite uses less of the disk space, but I don't get replication, so I'll work that one out.

It's definitely not a bug, or the solution to the "bug" is to use -parallelism=1. Thanks for the attention!

MattiDeGrauwe commented 3 years ago

It seems like we encounter this issue on MySQL as well (when using for loops/counts over multiple powerdns resources). Also seems like it only fails on update/modifications/deletes, the inital creates are working just fine.

Could you try to replicate this perhaps? @mbag

Thanks in advance

mbag commented 3 years ago

@MattiDeGrauwe if it's completelly the same issue as originally reported, I don't think there is need to replicate it. You are hitting API too often and database lock doesn't get released in time. Same thing would happen if you used curl in a loop and did too many API requests. For lowering number of concurrent terraform calls, use -parallelism flag, to lower the number concurrent processes.

MattiDeGrauwe commented 3 years ago

Hello @mbag, thanks for your quick reply. I do agree that there is no need to replicate, but the big difference in this situation is that we are using MySQL and not SQLite3.

Attaching the -parallelism flag to our pipeline is no option, since this would slow down our pipeline too much.

prologic commented 3 years ago

Isn't this really a bug with PowerDNS itself and not this Terraform provider?