sipcapture / heplify

Portable and Lightweight HEP Capture Agent for HOMER
https://sipcapture.org
GNU Affero General Public License v3.0
185 stars 65 forks source link

Docker in k8s #160

Closed danjenkins closed 4 years ago

danjenkins commented 4 years ago

The container formed with the image from the repo (on dockerhub) complains about dns entries that are available not being there.

2020/03/09 13:42:52.328303 hep.go:37: ERR dial udp: lookup heplify-server on 172.20.0.2:53: no such host

Critical: cannot establish a connection
2020/03/09 13:54:33.203911 sniffer.go:123: INFO ostype: linux, osarch: amd64

Critical: cannot establish a connection

2020/03/09 13:54:33.222102 hep.go:37: ERR dial udp: lookup heplify-server.staging.svc.cluster.local: no such host

I've brought up the k8s dnsutils pod kubectl apply -f https://k8s.io/examples/admin/dns/dnsutils.yaml and that proves that the dns entries are available but heplify complains. I can only summize that due to how go applications are compiled it's not accessing the global dns available in this case.

Not really sure where to go from here in debugging this - any help would be greatly appreciated.

startup log output is

2020/03/09 13:54:33.203327 sniffer.go:114: INFO config.Config{Iface:(*config.InterfacesConfig)(0xc000152c00), Logging:(*logp.Logging)(0xc0000aa0f0), Mode:"SIPRTCP", Dedup:false, Filter:"", Discard:"", DiscardMethod:"", Zip:false, HepServer:"heplify-server.staging.svc.cluster.local:9060", HepNodePW:"", HepNodeID:0x7d2, HepNodeName:"", Network:"udp", Protobuf:false, Reassembly:false, Version:false}
2020/03/09 13:54:33.203442 sniffer.go:115: INFO &config.InterfacesConfig{Device:"any", Type:"pcap", ReadFile:"", WriteFile:"", RotationTime:60, PortRange:"5060-5090", WithVlan:false, WithErspan:false, Snaplen:8192, BufferSizeMb:32, ReadSpeed:false, OneAtATime:false, Loop:1}
2020/03/09 13:54:33.203706 sniffer.go:116: INFO bpf: (tcp or sctp) and greater 42 and portrange 5060-5090 or (udp and greater 128 and portrange 5060-5090 or ip[6:2] & 0x1fff != 0 or ip6[6]=44) or (ip and ip[6] & 0x2 = 0 and ip[6:2] & 0x1fff = 0 and udp and udp[8] & 0xc0 = 0x80 and udp[9] >= 0xc8 && udp[9] <= 0xcc)
2020/03/09 13:54:33.203911 sniffer.go:123: INFO ostype: linux, osarch: amd64
negbie commented 4 years ago

Hi @danjenkins, I had something like this in the past on a customer cluster with a different application written in Go and this solved it for me: https://github.com/golang/go/pull/29594 https://github.com/golang/go/pull/29661

But it's good possible that you issue is related to how that docker image is build (alpine+static)

danjenkins commented 4 years ago

Thanks @negbie ! I'll go and see if i can change the resolv.conf to add use-vc and single-request settings.... you think that should just be a default in the image?

danjenkins commented 4 years ago

oh it cant be a default in the image due to how its all inherited...

negbie commented 4 years ago

@danjenkins exactly we need to find a better way.

negbie commented 4 years ago

@danjenkins something else. Does the behaviour change when you use TCP with heplify -nt flag? Like -nt tcp or -nt tls. Make sure to configure HEPTCPAddr or HEPTLSAddr in the heplify-server container. I would suggest to use TLS anyway.

danjenkins commented 4 years ago

that won't change the host resolution though will it? I guess its using a different part of go etc.... I didnt want to add the overhead of TLS because its all contained within a k8s cluster.

negbie commented 4 years ago

@danjenkins it shouldn't but Go's netstack has a lot of black magic behind the scenes so who knows ;)

danjenkins commented 4 years ago

So i added

template:
    spec:
      dnsConfig:
        options:
        - name: use-vc  # specifies to local dns resolver to use tcp over udp.  udp is flakey in containers
        - name: single-request-reopen
        - name: single-request

to the manifest for the dpeloyment and thats had no affect.

Going to try doing a postrun change of resolv.conf

negbie commented 4 years ago

@danjenkins too bad! Ok let me loop in @lmangani since he controls the sipcapture repo on dockerhub and the automatic builders. I would suggest to build an alpine static image which is tiny and a bigger one with the standard Go image as builder.

danjenkins commented 4 years ago

Just tried

          lifecycle:
            postStart:
              exec:
                command:
                - /bin/sh
                - -c
                - "/bin/echo 'options single-request-reopen' >> /etc/resolv.conf" 

and that also appears to fail. annoyingly because the run command fails I can't exec into the container

danjenkins commented 4 years ago

Just trying out sending via tcp instead, i doubt that'll have an affect but worth a go

danjenkins commented 4 years ago

yup - 2020/03/10 10:50:55.181243 hep.go:37: ERR dial tcp: lookup heplify-server.staging.svc.cluster.local: no such host

negbie commented 4 years ago

@danjenkins ok I think we need more image options for the user to choose from.

danjenkins commented 4 years ago

yeah, at this point i'd be fine with an ubuntu full blown base if it worked :D

lmangani commented 4 years ago

@negbie afaik all the images are Alpine based nowadays already. For heplify we use the included Dockerfile but I'm happy to create and push any OS variant to facilitate this testing

lmangani commented 4 years ago

@danjenkins we can make that happen - let's elect the next OS container to build and I'll take care of it

danjenkins commented 4 years ago

@lmangani is it easy to build one with buster or stretch https://hub.docker.com/layers/golang/library/golang/buster/images/sha256-944405641f9fb0f322be1dfc4685b916df2de3df54525cf80822f8a0529f636f?context=explore and just push it to dockerhub with a test tag, then i can test it fast

negbie commented 4 years ago

Ubuntu images are quite small too.

danjenkins commented 4 years ago

yeah @negbie if we go with buster or stretch then it means its just a change of tag from the golang repo so theoretically no real change to your dockerfile etc etc

negbie commented 4 years ago

@danjenkins makes sense let's go the frictionless way.

danjenkins commented 4 years ago

so i just made a debian buster version (of course requires more changes because youre not using debian)

But now i have

2020/03/10 12:58:08.837177 hep.go:37: ERR dial tcp: lookup heplify-server.staging.svc.cluster.local: device or resource busy
danjenkins commented 4 years ago

Seems like go deals with the full .local dns differently.... trying the non full k8s url

https://github.com/segmentio/kafka-go/issues/285

danjenkins commented 4 years ago

If i go back to shortened dns it still errors using buster.

2020/03/10 13:02:50.697378 hep.go:37: ERR dial tcp: lookup heplify-server on 172.20.0.2:53: no such host
negbie commented 4 years ago

Thanks for helping out @danjenkins. Did you try to remove the build flags here?

RUN CGO_ENABLED=1 GOOS=linux go build -a --ldflags '-linkmode external -extldflags "-static -s -w"' -o heplify .

and just use something like

RUN CGO_ENABLED=1 GOOS=linux go build -o heplify .

danjenkins commented 4 years ago

That was going to be my next question... I'm not knowledgable with Go - let me give that a go

danjenkins commented 4 years ago

Error from the container now

standard_init_linux.go:190: exec user process caused "no such file or directory"
danjenkins commented 4 years ago

Docker file I used was

FROM golang:buster as builder
RUN apt-get update
RUN apt-get install apt-utils musl-dev gcc libpcap-dev ca-certificates git build-essential -y
COPY . /heplify
WORKDIR /heplify
RUN CGO_ENABLED=1 GOOS=linux go build -o heplify .

FROM scratch
COPY --from=builder /etc/ssl/certs/ca-certificates.crt /etc/ssl/certs/ca-certificates.crt
COPY --from=builder /heplify/heplify .
CMD ["./heplify", "-h"]
negbie commented 4 years ago

@danjenkins I'm not sure if @lmangani uses this as build script but for me this docker file seems wrong. Whats about this:

FROM golang:buster as builder
RUN apt-get update
RUN apt-get install apt-utils gcc libpcap-dev ca-certificates git build-essential -y
RUN go get -d -v -u github.com/negbie/heplify
WORKDIR /root/go/src/github.com/negbie/heplify/
RUN CGO_ENABLED=1 GOOS=linux go build -o heplify .

FROM scratch
COPY --from=builder /etc/ssl/certs/ca-certificates.crt /etc/ssl/certs/ca-certificates.crt
COPY --from=builder root/go/src/github.com/negbie/heplify/heplify .
CMD ["./heplify", "-h"]
negbie commented 4 years ago

Can't test it currently so take it with care.

danjenkins commented 4 years ago

@negbie being very lazy here... it errored

Step 6/10 : RUN CGO_ENABLED=1 GOOS=linux go build -o heplify .
 ---> Running in e35a5ef1828b
can't load package: package .: no Go files in /root/go/src/github.com/negbie/heplify
The command '/bin/sh -c CGO_ENABLED=1 GOOS=linux go build -o heplify .' returned a non-zero code: 1
danjenkins commented 4 years ago

ah! missing a /

danjenkins commented 4 years ago

oh no that wasn;t it...

lmangani commented 4 years ago

This builds

RUN apt-get install apt-utils gcc libpcap-dev ca-certificates git build-essential -y
RUN go get -d -v -u github.com/negbie/heplify
WORKDIR /go/src/github.com/negbie/heplify/
RUN CGO_ENABLED=1 GOOS=linux go build -o heplify .

FROM scratch
COPY --from=builder /etc/ssl/certs/ca-certificates.crt /etc/ssl/certs/ca-certificates.crt
COPY --from=builder /go/src/github.com/negbie/heplify/heplify .
CMD ["./heplify", "-h"]

Untested and pushed for your leisure: sicapture/heplify:buster

danjenkins commented 4 years ago

thanks @lmangani !!

danjenkins commented 4 years ago

@lmangani that gives me the same error?

standard_init_linux.go:190: exec user process caused "no such file or directory"

negbie commented 4 years ago

@danjenkins Currently hoping from meeting to meeting so I will come back to you this evening when I'm at home and can verfiy stuff I paste here ;)

danjenkins commented 4 years ago

No problem :)

lmangani commented 4 years ago

@danjenkins how can i replicate this? (nevermind, I can, testing a fix)

lmangani commented 4 years ago

@danjenkins try again please

FROM golang:buster as builder
RUN apt-get update
RUN apt-get install apt-utils gcc libpcap-dev ca-certificates git build-essential -y
RUN go get -d -v -u github.com/negbie/heplify
WORKDIR /go/src/github.com/negbie/heplify/
RUN CGO_ENABLED=1 GOOS=linux go build -a --ldflags '-linkmode external -extldflags "-static -s -w"' -o heplify .
RUN chmod +x /go/src/github.com/negbie/heplify/heplify

FROM scratch
COPY --from=builder /etc/ssl/certs/ca-certificates.crt /etc/ssl/certs/ca-certificates.crt
COPY --from=builder /go/src/github.com/negbie/heplify/heplify /heplify
CMD ["/heplify", "-h"]
danjenkins commented 4 years ago

thanks! trying it out now

danjenkins commented 4 years ago

@lmangani :(

Now I have less useful logs....

Critical: cannot establish a connection
negbie commented 4 years ago

@danjenkins in @lmangani build script I still see

RUN CGO_ENABLED=1 GOOS=linux go build -a --ldflags '-linkmode external -extldflags "-static -s -w"' -o heplify .

maybe this should be changed first.

danjenkins commented 4 years ago

Sorry I'm not following what you're suggesting @negbie - been looking at this too long :D

lmangani commented 4 years ago

@negbie without the error above appears, with the full options it works (apparently) @danjenkins what config are you passing it?

danjenkins commented 4 years ago
- name: heplify
          image: danjenkins/heplify:latest
          command:
            - "./heplify"
            - "-nt"
            - "tcp"
            - "-hs"
            - "heplify-server:9060"
lmangani commented 4 years ago

By default, heplify-server listens on 9060/UDP unless you configured the HEPTCPAddr setting, have you?

negbie commented 4 years ago

@negbie without the error above appears, with the full options it works (apparently) @danjenkins what config are you passing it?

@lmangani since some dns issues with Go apps on Kubernetes are due to how they are compiled I want to make sure that no further flags are provided so instead of

RUN CGO_ENABLED=1 GOOS=linux go build -a --ldflags '-linkmode external -extldflags "-static -s -w"' -o heplify .

use just

RUN CGO_ENABLED=1 GOOS=linux go build -o heplify .

danjenkins commented 4 years ago

moving back to udp didnt solve the issue.... how do i get the rest of the logging back? :S

danjenkins commented 4 years ago

@negbie I tried with your RUN command and I got standard_init_linux.go:190: exec user process caused "no such file or directory" error from the container.

I'm kinda stuck now and this is holding up a deployment - this is for a kamailio k8s deployment so im thinking about just enabling kamailio do send the data instead... but really didnt want to do that - any ideas?

negbie commented 4 years ago

Hi @danjenkins Im sure we can fix this but I need at least 30min spare time to look into it. Will try to get them today.