grafana / loki

Like Prometheus, but for logs.
https://grafana.com/loki
GNU Affero General Public License v3.0
24k stars 3.46k forks source link

Unable to resolve .local loki address #3994

Open andsens opened 3 years ago

andsens commented 3 years ago

I am running a local Loki 2.1.0 install on my dev box. logcli fails to resolve the loki endpoint address with the following error:

$ GODEBUG=netdns=cgo+10 LOKI_ADDR="https://loki-aim.local" logcli query '{program="postgresql"}'
2021-07-13 18:28:40.766950 I | proto: duplicate proto type registered: ingester.Series
https://loki-aim.local/loki/api/v1/query_range?direction=BACKWARD&end=1626193720767614401&limit=30&query=%7Bprogram%3D%22postgresql%22%7D&start=1626190120767614401
go package net: built with netgo build tag; using Go's DNS resolver
go package net: hostLookupOrder(loki-aim.local) = files,dns
Query failed: Get "https://loki-aim.local/loki/api/v1/query_range?direction=BACKWARD&end=1626193720767614401&limit=30&query=%7Bprogram%3D%22postgresql%22%7D&start=1626190120767614401": dial tcp: lookup loki-aim.local on 127.0.0.1:53: read udp 127.0.0.1:60814->127.0.0.1:53: i/o timeout

I'm fairly certain that the issue here is that I am using an mDNS address (loki-aim.local) for my loki endpoint while logcli is compiled with the netgo tag, preventing the internal resolver from falling back to the cgo resolver.

You can read more about it here:

By default the pure Go resolver is used, because a blocked DNS request consumes only a goroutine, while a blocked C call consumes an operating system thread. When cgo is available, the cgo-based resolver is used instead under a variety of conditions: on systems that do not let programs make direct DNS requests (OS X), when the LOCALDOMAIN environment variable is present (even if empty), when the RES_OPTIONS or HOSTALIASES environment variable is non-empty, when the ASR_CONFIG environment variable is non-empty (OpenBSD only), when /etc/resolv.conf or /etc/nsswitch.conf specify the use of features that the Go resolver does not implement, and when the name being looked up ends in .local or is an mDNS name.

The resolver decision can be overridden by setting the netdns value of the GODEBUG environment variable (see package runtime) to go or cgo, as in:

export GODEBUG=netdns=go    # force pure Go resolver
export GODEBUG=netdns=cgo   # force cgo resolver

The decision can also be forced while building the Go source tree by setting the netgo or netcgo build tag.

Using mDNS in dev is super useful if one doesn't want to setup all kinds of dns resolution hacks. I am unable to work around the problem with e.g. LOKI_ADDR="https://$(avahi-resolve -n loki-aim.local | cut -f2)" because my reverse-proxy in front of loki requires TLS with SNI.

As far as I can see, removing -tag netgo here https://github.com/grafana/loki/blob/4a8f62ba00bcf6d338833eb506f91789c9ed584e/Makefile#L60-L69 should do the trick.

There's also this check: https://github.com/grafana/loki/blob/4a8f62ba00bcf6d338833eb506f91789c9ed584e/Makefile#L73-L80

Blaming the lines doesn't reveal much, it's been there since the beginning. It seems like it's supposed to make sure that the pure go resolver is included, but instead just makes sure that the cgo one is excluded. As far as I understand there's no issue in both being present, so changing that check to make sure the pure go resolver is included would seem to be the way to go.

stale[bot] commented 3 years ago

This issue has been automatically marked as stale because it has not had any activity in the past 30 days. It will be closed in 7 days if no further activity occurs. Thank you for your contributions.

andsens commented 3 years ago

/remove-label stale

andsens commented 3 years ago

Alright, so I compiled my own version for testing purposes. Beyond what I outlined above the CGO_ENABLED=0 also needs to be removed. Here's the Makefile diff:

diff --git a/Makefile b/Makefile
index 0cddb1721..4e7a3cc14 100644
--- a/Makefile
+++ b/Makefile
@@ -60,13 +60,13 @@ APP_GO_FILES := $(shell find . $(DONT_FIND) -name .y.go -prune -o -name .pb.go -
 # Build flags
 VPREFIX := github.com/grafana/loki/pkg/build
 GO_LDFLAGS   := -X $(VPREFIX).Branch=$(GIT_BRANCH) -X $(VPREFIX).Version=$(IMAGE_TAG) -X $(VPREFIX).Revision=$(GIT_REVISION) -X $(VPREFIX).BuildUser=$(shell whoami)@$(shell hostname) -X $(VPREFIX).BuildDate=$(shell date -u +"%Y-%m-%dT%H:%M:%SZ")
-GO_FLAGS     := -ldflags "-extldflags \"-static\" -s -w $(GO_LDFLAGS)" -tags netgo $(MOD_FLAG)
-DYN_GO_FLAGS := -ldflags "-s -w $(GO_LDFLAGS)" -tags netgo $(MOD_FLAG)
+GO_FLAGS     := -ldflags "-extldflags \"-static\" -s -w $(GO_LDFLAGS)" $(MOD_FLAG)
+DYN_GO_FLAGS := -ldflags "-s -w $(GO_LDFLAGS)" $(MOD_FLAG)
 # Per some websites I've seen to add `-gcflags "all=-N -l"`, the gcflags seem poorly if at all documented
 # the best I could dig up is -N disables optimizations and -l disables inlining which should make debugging match source better.
 # Also remove the -s and -w flags present in the normal build which strip the symbol table and the DWARF symbol table.
-DEBUG_GO_FLAGS     := -gcflags "all=-N -l" -ldflags "-extldflags \"-static\" $(GO_LDFLAGS)" -tags netgo $(MOD_FLAG)
-DYN_DEBUG_GO_FLAGS := -gcflags "all=-N -l" -ldflags "$(GO_LDFLAGS)" -tags netgo $(MOD_FLAG)
+DEBUG_GO_FLAGS     := -gcflags "all=-N -l" -ldflags "-extldflags \"-static\" $(GO_LDFLAGS)" $(MOD_FLAG)
+DYN_DEBUG_GO_FLAGS := -gcflags "all=-N -l" -ldflags "$(GO_LDFLAGS)" $(MOD_FLAG)
 # Docker mount flag, ignored on native docker host. see (https://docs.docker.com/docker-for-mac/osxfs-caching/#delegated)
 MOUNT_FLAGS := :delegated

@@ -157,8 +157,7 @@ logcli-image:
        $(SUDO) docker build -t $(IMAGE_PREFIX)/logcli:$(IMAGE_TAG) -f cmd/logcli/Dockerfile .

 cmd/logcli/logcli: $(APP_GO_FILES) cmd/logcli/main.go
-       CGO_ENABLED=0 go build $(GO_FLAGS) -o $@ ./$(@D)
-       $(NETGO_CHECK)
+       go build $(GO_FLAGS) -o $@ ./$(@D)

 ########
 # Loki #

The results are as expected, resolving mDNS addresses now works.

2021-09-21_11-46

stale[bot] commented 2 years ago

Hi! This issue has been automatically marked as stale because it has not had any activity in the past 30 days.

We use a stalebot among other tools to help manage the state of issues in this project. A stalebot can be very useful in closing issues in a number of cases; the most common is closing issues or PRs where the original reporter has not responded.

Stalebots are also emotionless and cruel and can close issues which are still very relevant.

If this issue is important to you, please add a comment to keep it open. More importantly, please add a thumbs-up to the original issue entry.

We regularly sort for closed issues which have a stale label sorted by thumbs up.

We may also:

We are doing our best to respond, organize, and prioritize all issues but it can be a challenging task, our sincere apologies if you find yourself at the mercy of the stalebot.

andsens commented 2 years ago

/remove-lifecycle stale

stale[bot] commented 2 years ago

Hi! This issue has been automatically marked as stale because it has not had any activity in the past 30 days.

We use a stalebot among other tools to help manage the state of issues in this project. A stalebot can be very useful in closing issues in a number of cases; the most common is closing issues or PRs where the original reporter has not responded.

Stalebots are also emotionless and cruel and can close issues which are still very relevant.

If this issue is important to you, please add a comment to keep it open. More importantly, please add a thumbs-up to the original issue entry.

We regularly sort for closed issues which have a stale label sorted by thumbs up.

We may also:

We are doing our best to respond, organize, and prioritize all issues but it can be a challenging task, our sincere apologies if you find yourself at the mercy of the stalebot.

DylanGuedes commented 2 years ago

hmm if CGO_ENABLED=0 has to be removed it means it should use CGO_ENABLED. Isn't that potentially a breaking change to Loki users?

tristanmorgan commented 1 year ago

With changes to in 1.20.x I've been getting best results on mac with CGO_ENABLED=0 and let the new code call libc for resolution.

see golang/go@a3559f33