Open ophintor opened 1 month ago
You could try adding the following extraArgs
in case it's a timeout issue, or a rate limit issue:
--interval=3m
--request-timeout=60s
Thanks for the suggestion. Unfortunately none of those seem to work. I have also tried one of a mix of the following:
aws-batch-change-size
aws-batch-change-size-values
aws-batch-change-interval
Each zone will have around 250 records, so in total it should be able to add about 1250 in the 5 zones. I have tried with 10s intervals and 200 records at a time but no luck so far...
I've been looking at the code (keep in mind I don't speak Go...) and I can see a few things that I'm not sure if I understand:
aws.go, line 602
// submitChanges takes a zone and a collection of Changes and sends them as a single transaction.
func (p *AWSProvider) submitChanges(ctx context.Context, changes Route53Changes, zones map[string]*profiledZone) error {
// return early if there is nothing to change
if len(changes) == 0 {
log.Info("All records are already up to date")
return nil
}
After removing all the records in Z0005 and re-installing the chart, I can see from the logs in the pod that len(changes) == 0
, which is not right because there are plenty of changes to be applied to that zone. I can see the 'All records are already up to date' message in the logs.
When I look back to line 585 (func (p *AWSProvider) ApplyChanges(ctx context.Context, changes *plan.Changes) error {
), I can see that the list with the combined changes is created and sent to the function submiChanges
above. However, it seems that the list of changes is empty, which shouldn't be.
At this point, I'm not sure where this function is called from (maybe from aws_sd.go
?) and/or what's in the context. I have the suspicion that the code is not managing well having multiple zones ids all with the same name, but I can't figure out if that's the case by looking at the code.
I have tried all possible combinations of values in the chart, but the only way I can get this to work is by doing the zones one by one manually, which is far from ideal. I'm looking at maybe trying to separate the deployments so to have one per zone (pretty sure that would solve my issue) but I don't think I can do that without modifying the chart myself.
Any help would be appreciated.
@ophintor Can you test if with external-dns version 0.13.6 this issue also happens?
We actually moved from 0.13.6 to 0.14.1, and then to 0.14.2 because of the issue, in the hope that a newer version would fix it.
At the moment I have found a workaround that involves creating one deployment per zone and that works for us, but as it is I cannot make it work.
Thanks!
@ophintor Thank you for sharing that information. I'll be working in something related to that function on a different issue that is not related and I'll check if I can take a look to this issue. Can you put a obfuscated example of what are the names for the 5 different hosted zones? Something like: HZ1: mydomain.com HZ2: internal.mydomain.com HZ3: us-east-1.internal.mydomain.com HZ4: external.mydomain.com HZ5 api.mydomain.com
This would help us to understand and reproduce the issue better
Hello, the name of the zone is the same for all 5, so it would be something like:
HZ1: thisdomain.local HZ2: thisdomain.local HZ3: thisdomain.local HZ4: thisdomain.local HZ5 thisdomain.local
Many thanks.
What happened We're using external-dns within EKS (v0.14.1). We have an environment with 5 different AWS Route 53 zones configured in it. We recently noticed that some records were not being updated.
What you expected to happen: I expected all records in all zones to be updated.
How to reproduce it (as minimally and precisely as possible):
txtOwnerId: Z0001 txtPrefix: external-dns
extraArgs:
Workaround:
Environment:
external-dns --version
): v0.14.1 (helm chart)