openshift / origin

Conformance test suite for OpenShift
http://www.openshift.org
Apache License 2.0
8.5k stars 4.7k forks source link

Network metrics missing after reploying application [OKD 3.11 + CRI-O] #25000

Closed uselessidbr closed 3 years ago

uselessidbr commented 4 years ago

[provide a description of the issue] While using OKD 3.11 + crio container runtime the Metrics is missing Newtork DATA from time to time. If a pod is redeployed it doesn't show network data anymore.

The workaround is the restart origin-node service at all nodes.

It's supposed to be fixed at the commit:

image

I'm using a newer release:

image

It seems related to this issue: https://github.com/openshift/origin/issues/23492

Version

[provide output of the openshift version or oc version command]

# oc version
oc v3.11.0+62803d0-1
kubernetes v1.11.0+d4cacc0
features: Basic-Auth GSSAPI Kerberos SPNEGO

Server https://xyz:8443
openshift v3.11.0+7876dd5-361
kubernetes v1.11.0+d4cacc0
Steps To Reproduce
  1. Deploy a cluster using openshift-ansible with CRIO as container runtime;
  2. Watch network's metrics disappear after redeploying a pod.
Current Result

Before redeploy:

image image

After redeploy:

image image

Expected Result

Network metrics being reported correctly after redeploy.

Additional Information

[try to run $ oc adm diagnostics (or oadm diagnostics) command if possible] [if you are reporting issue related to builds, provide build logs with BUILD_LOGLEVEL=5] [consider attaching output of the $ oc get all -o json -n <namespace> command to the issue] [visit https://docs.openshift.org/latest/welcome/index.html]

uselessidbr commented 4 years ago

It seems the same problem related in #23492

Reamer commented 4 years ago

@uselessidbr I have built okd rpms myself and installed these rpms in my environment. The fix should be in release-3.11-branch.

uselessidbr commented 4 years ago

@uselessidbr I have built okd rpms myself and installed these rpms in my environment. The fix should be in release-3.11-branch.

I'm not sure how to update a cluster installed via openshift-ansible.

Which RPMs should be updated?

It seems to me that the cadvisor is integrated with the kubelet:

# netstat -tunap | grep 10250
tcp6       0      0 :::10250                :::*                    LISTEN      469/hyperkube

Which is part of origin-node:

● origin-node.service - OpenShift Node
   Loaded: loaded (/etc/systemd/system/origin-node.service; enabled; vendor preset: disabled)
   Active: active (running) since Fri 2020-05-22 17:17:51 -03; 2 days ago
     Docs: https://github.com/openshift/origin
 Main PID: 469 (hyperkube)
   Memory: 126.3M
   CGroup: /system.slice/origin-node.service
           └─469 /usr/bin/hyperkube kubelet --v=2 --address=0.0.0.0 --allow-privileged=true --anonymous-auth=true --authentication-tok...

Which i guess is provided by:

# yum list installed origin-node
Installed Packages
origin-node.x86_64                                 3.11.0-1.el7.git.0.62803d0

Is that correct?

Reamer commented 4 years ago

I'm using hack/build-rpms.sh to build new okd rpms and createrepo to setup my own rpm-Repository. After that a simple Apache can deliver your rpms. Of course you should add your new repository under /etc/yum.repo.d/my_custom_okd_rpms

To upgrade your cluster you can use the playbook playbooks/common/openshift-cluster/upgrades/v3_11/upgrade.yml

Reamer commented 4 years ago

I'm using this few commands to build okd 3.11 rpms inside a centos:7 docker container.

# download some stuff
yum upgrade -y
yum install -y epel-release
yum install -y git which golang golang-race make gcc zip mercurial krb5-devel bsdtar bc rsync bind-utils file jq tito createrepo openssl gpgme gpgme-devel libassuan libassuan-devel
# Prepare Go env
mkdir $HOME/go
export GOPATH=$HOME/go
export PATH=$PATH:$GOPATH/bin
export OS_OUTPUT_GOPATH=1
export OS_ONLY_BUILD_PLATFORMS=linux/amd64
mkdir -p $GOPATH/src/github.com/openshift
cd $GOPATH/src/github.com/openshift
# Checkout Openshift
git clone git://github.com/openshift/origin
cd origin
git checkout release-3.11
# build rpms
./hack/build-rpms.sh
uselessidbr commented 4 years ago

I'm using hack/build-rpms.sh to build new okd rpms and createrepo to setup my own rpm-Repository. After that a simple Apache can deliver your rpms. Of course you should add your new repository under /etc/yum.repo.d/my_custom_okd_rpms

To upgrade your cluster you can use the playbook playbooks/common/openshift-cluster/upgrades/v3_11/upgrade.yml

Thanks man! I will try to use the playbook to upgrade the cluster and see what happens.

As some of the services are containerized I'm just not sure of which component delivers the cadvisor and therefore should be updated.

Reamer commented 4 years ago

As some of the services are containerized I'm just not sure of which component delivers the cadvisor and therefore should be updated.

cadvisor is part of hyperkube ( a daemon on your host, started with systemd) and is delivers by rpms.

uselessidbr commented 4 years ago

As some of the services are containerized I'm just not sure of which component delivers the cadvisor and therefore should be updated.

cadvisor is part of hyperkube ( a daemon on your host, started with systemd) and is delivers by rpms.

Oh, yeah, i thought so but just wasn't sure about it.

So, it shouldn't be updated by running a simple "yum update origin-node"? I've tried that but there's no updates in the repository.

I'm deploying a new cluster and will try some of your suggestions!

Thanks again!

uselessidbr commented 4 years ago

Got an error while trying to build the RPMs:

[INFO] Building release RPMs for /var/www/html/origin/origin.spec ... Executing(%prep): /bin/sh -e /var/tmp/rpm-tmp.EFa7sh

github.com/openshift/origin/cmd/openshift

/usr/lib/golang/pkg/tool/linux_amd64/link: running gcc failed: exit status 1 collect2: error: ld returned 1 exit status

github.com/openshift/origin/vendor/k8s.io/kubernetes/cmd/hyperkube

/usr/lib/golang/pkg/tool/linux_amd64/link: running gcc failed: exit status 1 collect2: error: ld returned 1 exit status

[ERROR] PID 62956: hack/lib/build/binaries.sh:236: GOOS=${platform%/*} GOARCH=${platform##*/} go install -pkgdir "${pkgdir}/${platform}" -tags "${OS_GOFLAGS_TAGS-} ${!platform_gotags_envvar:-}" -ldflags="${local_ldflags}" "${goflags[@]:+${goflags[@]}}" -gcflags "${gogcflags}" "${nonstatics[@]}" exited with status 2. [INFO] Stack Trace: [INFO] 1: hack/lib/build/binaries.sh:236: GOOS=${platform%/*} GOARCH=${platform##*/} go install -pkgdir "${pkgdir}/${platform}" -tags "${OS_GOFLAGS_TAGS-} ${!platform_gotags_envvar:-}" -ldflags="${local_ldflags}" "${goflags[@]:+${goflags[@]}}" -gcflags "${gogcflags}" "${nonstatics[@]}" [INFO] 2: hack/lib/build/binaries.sh:156: os::build::internal::build_binaries [INFO] 3: /var/www/html/origin/hack/build-cross.sh:76: os::build::build_binaries [INFO] Exiting with code 2. [ERROR] PID 62638: hack/lib/build/binaries.sh:150: local -a binaries=("$@") exited with status 2. [INFO] Stack Trace: [INFO] 1: hack/lib/build/binaries.sh:150: local -a binaries=("$@") [INFO] 2: /var/www/html/origin/hack/build-cross.sh:76: os::build::build_binaries [INFO] Exiting with code 2. make: *** [build-cross] Error 2 error: Bad exit status from /var/tmp/rpm-tmp.vOIvd9 (%build)

RPM build errors: Bad exit status from /var/tmp/rpm-tmp.vOIvd9 (%build) [ERROR] PID 62400: hack/build-rpms.sh:78: rpmbuild -b${srpm} "${OS_RPM_SPECFILE}" --define "skip_dist 1" --define "make_redistributable ${make_redistributable}" --define "version ${OS_RPM_VERSION}" --define "release ${OS_RPM_RELEASE}" --define "commit ${OS_GIT_COMMIT}" --define "os_git_vars ${OS_RPM_GIT_VARS}" --define 'dist .el7' --define "_topdir ${rpm_tmp_dir}" exited with status 1. [INFO] Stack Trace: [INFO] 1: hack/build-rpms.sh:78: rpmbuild -b${srpm} "${OS_RPM_SPECFILE}" --define "skip_dist 1" --define "make_redistributable ${make_redistributable}" --define "version ${OS_RPM_VERSION}" --define "release ${OS_RPM_RELEASE}" --define "commit ${OS_GIT_COMMIT}" --define "os_git_vars ${OS_RPM_GIT_VARS}" --define 'dist .el7' --define "_topdir ${rpm_tmp_dir}" [INFO] Exiting with code 1. [ERROR] hack/build-rpms.sh exited with code 1 after 00h 05m 08s [root@okd origin]# less /var/tmp/rpm-tmp.vOIvd9

!/bin/sh

RPM_SOURCE_DIR="/tmp/openshift/build-rpms/rpm/SOURCES" RPM_BUILD_DIR="/tmp/openshift/build-rpms/rpm/BUILD" RPM_OPT_FLAGS="-O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector-strong --param=ssp-buffer-size=4 -grecord-gcc-switches -m64 -mtune=generic" RPM_LD_FLAGS="-Wl,-z,relro " RPM_ARCH="x86_64" RPM_OS="linux" export RPM_SOURCE_DIR RPM_BUILD_DIR RPM_OPT_FLAGS RPM_LD_FLAGS RPM_ARCH RPM_OS RPM_DOC_DIR="/usr/share/doc" export RPM_DOC_DIR RPM_PACKAGE_NAME="origin" RPM_PACKAGE_VERSION="3.11.0" RPM_PACKAGE_RELEASE="1.459.c0fc512" export RPM_PACKAGE_NAME RPM_PACKAGE_VERSION RPM_PACKAGE_RELEASE LANG=C export LANG unset CDPATH DISPLAY ||: RPM_BUILD_ROOT="/tmp/openshift/build-rpms/rpm/BUILDROOT/origin-3.11.0-1.459.c0fc512.x86_64" export RPM_BUILD_ROOT

PKG_CONFIG_PATH="${PKG_CONFIG_PATH}:/usr/lib64/pkgconfig:/usr/share/pkgconfig" export PKG_CONFIG_PATH

set -x umask 022 cd "/tmp/openshift/build-rpms/rpm/BUILD" cd 'origin-3.11.0'

Create Binaries only for building arch

BUILD_PLATFORM="linux/amd64" OS_ONLY_BUILD_PLATFORMS="${BUILD_PLATFORM}" OS_GIT_COMMIT='c0fc512' OS_GIT_TREE_STATE='clean' OS_GIT_VERSION='v3.11.0+c0fc512-459' OS_GIT_MAJOR='3' OS_GIT_MINOR='11+' OS_GIT_PATCH='0' KUBE_GIT_MAJOR='1' KUBE_GIT_MINOR='11+' KUBE_GIT_COMMIT='d4cacc0' KUBE_GIT_VERSION='v1.11.0+d4cacc0' ETCD_GIT_VERSION='v3.2.16-0-g121edf0' ETCD_GIT_COMMIT='121edf0' OS_BUILD_RELEASE_ARCHIVES=n make build-cross OS_ONLY_BUILD_PLATFORMS="${BUILD_PLATFORM}" OS_GIT_COMMIT='c0fc512' OS_GIT_TREE_STATE='clean' OS_GIT_VERSION='v3.11.0+c0fc512-459' OS_GIT_MAJOR='3' OS_GIT_MINOR='11+' OS_GIT_PATCH='0' KUBE_GIT_MAJOR='1' KUBE_GIT_MINOR='11+' KUBE_GIT_COMMIT='d4cacc0' KUBE_GIT_VERSION='v1.11.0+d4cacc0' ETCD_GIT_VERSION='v3.2.16-0-g121edf0' ETCD_GIT_COMMIT='121edf0' OS_BUILD_RELEASE_ARCHIVES=n make build WHAT=vendor/github.com/onsi/ginkgo/ginkgo

Generate man pages

OS_GIT_COMMIT='c0fc512' OS_GIT_TREE_STATE='clean' OS_GIT_VERSION='v3.11.0+c0fc512-459' OS_GIT_MAJOR='3' OS_GIT_MINOR='11+' OS_GIT_PATCH='0' KUBE_GIT_MAJOR='1' KUBE_GIT_MINOR='11+' KUBE_GIT_COMMIT='d4cacc0' KUBE_GIT_VERSION='v1.11.0+d4cacc0' ETCD_GIT_VERSION='v3.2.16-0-g121edf0' ETCD_GIT_COMMIT='121edf0' make build-docs

exit 0

Any idea about what's happening?

uselessidbr commented 4 years ago

Got an error while trying to build the RPMs:

[INFO] Building release RPMs for /var/www/html/origin/origin.spec ... Executing(%prep): /bin/sh -e /var/tmp/rpm-tmp.EFa7sh

  • umask 022
  • cd /tmp/openshift/build-rpms/rpm/BUILD
  • cd /tmp/openshift/build-rpms/rpm/BUILD
  • rm -rf origin-3.11.0
  • /usr/bin/gzip -dc /tmp/openshift/build-rpms/rpm/SOURCES/origin-3.11.0.tar.gz
  • /usr/bin/tar -xf -
  • STATUS=0
  • '[' 0 -ne 0 ']'
  • cd origin-3.11.0
  • /usr/bin/chmod -Rf a+rX,u+w,g-w,o-w .
  • exit 0 Executing(%build): /bin/sh -e /var/tmp/rpm-tmp.vOIvd9
  • umask 022
  • cd /tmp/openshift/build-rpms/rpm/BUILD
  • cd origin-3.11.0
  • BUILD_PLATFORM=linux/amd64
  • OS_ONLY_BUILD_PLATFORMS=linux/amd64
  • OS_GIT_COMMIT=c0fc512
  • OS_GIT_TREE_STATE=clean
  • OS_GIT_VERSION=v3.11.0+c0fc512-459
  • OS_GIT_MAJOR=3
  • OS_GIT_MINOR=11+
  • OS_GIT_PATCH=0
  • KUBE_GIT_MAJOR=1
  • KUBE_GIT_MINOR=11+
  • KUBE_GIT_COMMIT=d4cacc0
  • KUBE_GIT_VERSION=v1.11.0+d4cacc0
  • ETCD_GIT_VERSION=v3.2.16-0-g121edf0
  • ETCD_GIT_COMMIT=121edf0
  • OS_BUILD_RELEASE_ARCHIVES=n
  • make build-cross hack/build-cross.sh ++ Building go targets for linux/amd64: images/pod ++ Building go targets for linux/amd64: cmd/sdn-cni-plugin vendor/github.com/containernetworking/plugins/plugins/ipam/host-local vendor/github.com/containernetworking/plugins/plugins/main/loopback ++ Building go targets for linux/amd64: cmd/hypershift cmd/openshift cmd/oc cmd/oadm cmd/template-service-broker cmd/openshift-node-config vendor/k8s.io/kubernetes/cmd/hyperkube

github.com/openshift/origin/cmd/hypershift

/usr/lib/golang/pkg/tool/linux_amd64/link: running gcc failed: exit status 1 collect2: error: ld returned 1 exit status

github.com/openshift/origin/cmd/openshift

/usr/lib/golang/pkg/tool/linux_amd64/link: running gcc failed: exit status 1 collect2: error: ld returned 1 exit status

github.com/openshift/origin/vendor/k8s.io/kubernetes/cmd/hyperkube

/usr/lib/golang/pkg/tool/linux_amd64/link: running gcc failed: exit status 1 collect2: error: ld returned 1 exit status

[ERROR] PID 62956: hack/lib/build/binaries.sh:236: GOOS=${platform%/*} GOARCH=${platform##*/} go install -pkgdir "${pkgdir}/${platform}" -tags "${OS_GOFLAGS_TAGS-} ${!platform_gotags_envvar:-}" -ldflags="${local_ldflags}" "${goflags[@]:+${goflags[@]}}" -gcflags "${gogcflags}" "${nonstatics[@]}" exited with status 2. [INFO] Stack Trace: [INFO] 1: hack/lib/build/binaries.sh:236: GOOS=${platform%/*} GOARCH=${platform##*/} go install -pkgdir "${pkgdir}/${platform}" -tags "${OS_GOFLAGS_TAGS-} ${!platform_gotags_envvar:-}" -ldflags="${local_ldflags}" "${goflags[@]:+${goflags[@]}}" -gcflags "${gogcflags}" "${nonstatics[@]}" [INFO] 2: hack/lib/build/binaries.sh:156: os::build::internal::build_binaries [INFO] 3: /var/www/html/origin/hack/build-cross.sh:76: os::build::build_binaries [INFO] Exiting with code 2. [ERROR] PID 62638: hack/lib/build/binaries.sh:150: local -a binaries=("$@") exited with status 2. [INFO] Stack Trace: [INFO] 1: hack/lib/build/binaries.sh:150: local -a binaries=("$@") [INFO] 2: /var/www/html/origin/hack/build-cross.sh:76: os::build::build_binaries [INFO] Exiting with code 2. make: *** [build-cross] Error 2 error: Bad exit status from /var/tmp/rpm-tmp.vOIvd9 (%build)

RPM build errors: Bad exit status from /var/tmp/rpm-tmp.vOIvd9 (%build) [ERROR] PID 62400: hack/build-rpms.sh:78: rpmbuild -b${srpm} "${OS_RPM_SPECFILE}" --define "skip_dist 1" --define "make_redistributable ${make_redistributable}" --define "version ${OS_RPM_VERSION}" --define "release ${OS_RPM_RELEASE}" --define "commit ${OS_GIT_COMMIT}" --define "os_git_vars ${OS_RPM_GIT_VARS}" --define 'dist .el7' --define "_topdir ${rpm_tmp_dir}" exited with status 1. [INFO] Stack Trace: [INFO] 1: hack/build-rpms.sh:78: rpmbuild -b${srpm} "${OS_RPM_SPECFILE}" --define "skip_dist 1" --define "make_redistributable ${make_redistributable}" --define "version ${OS_RPM_VERSION}" --define "release ${OS_RPM_RELEASE}" --define "commit ${OS_GIT_COMMIT}" --define "os_git_vars ${OS_RPM_GIT_VARS}" --define 'dist .el7' --define "_topdir ${rpm_tmp_dir}" [INFO] Exiting with code 1. [ERROR] hack/build-rpms.sh exited with code 1 after 00h 05m 08s [root@okd origin]# less /var/tmp/rpm-tmp.vOIvd9

!/bin/sh

RPM_SOURCE_DIR="/tmp/openshift/build-rpms/rpm/SOURCES" RPM_BUILD_DIR="/tmp/openshift/build-rpms/rpm/BUILD" RPM_OPT_FLAGS="-O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector-strong --param=ssp-buffer-size=4 -grecord-gcc-switches -m64 -mtune=generic" RPM_LD_FLAGS="-Wl,-z,relro " RPM_ARCH="x86_64" RPM_OS="linux" export RPM_SOURCE_DIR RPM_BUILD_DIR RPM_OPT_FLAGS RPM_LD_FLAGS RPM_ARCH RPM_OS RPM_DOC_DIR="/usr/share/doc" export RPM_DOC_DIR RPM_PACKAGE_NAME="origin" RPM_PACKAGE_VERSION="3.11.0" RPM_PACKAGE_RELEASE="1.459.c0fc512" export RPM_PACKAGE_NAME RPM_PACKAGE_VERSION RPM_PACKAGE_RELEASE LANG=C export LANG unset CDPATH DISPLAY ||: RPM_BUILD_ROOT="/tmp/openshift/build-rpms/rpm/BUILDROOT/origin-3.11.0-1.459.c0fc512.x86_64" export RPM_BUILD_ROOT

PKG_CONFIG_PATH="${PKG_CONFIG_PATH}:/usr/lib64/pkgconfig:/usr/share/pkgconfig" export PKG_CONFIG_PATH

set -x umask 022 cd "/tmp/openshift/build-rpms/rpm/BUILD" cd 'origin-3.11.0'

Create Binaries only for building arch

BUILD_PLATFORM="linux/amd64" OS_ONLY_BUILD_PLATFORMS="${BUILD_PLATFORM}" OS_GIT_COMMIT='c0fc512' OS_GIT_TREE_STATE='clean' OS_GIT_VERSION='v3.11.0+c0fc512-459' OS_GIT_MAJOR='3' OS_GIT_MINOR='11+' OS_GIT_PATCH='0' KUBE_GIT_MAJOR='1' KUBE_GIT_MINOR='11+' KUBE_GIT_COMMIT='d4cacc0' KUBE_GIT_VERSION='v1.11.0+d4cacc0' ETCD_GIT_VERSION='v3.2.16-0-g121edf0' ETCD_GIT_COMMIT='121edf0' OS_BUILD_RELEASE_ARCHIVES=n make build-cross OS_ONLY_BUILD_PLATFORMS="${BUILD_PLATFORM}" OS_GIT_COMMIT='c0fc512' OS_GIT_TREE_STATE='clean' OS_GIT_VERSION='v3.11.0+c0fc512-459' OS_GIT_MAJOR='3' OS_GIT_MINOR='11+' OS_GIT_PATCH='0' KUBE_GIT_MAJOR='1' KUBE_GIT_MINOR='11+' KUBE_GIT_COMMIT='d4cacc0' KUBE_GIT_VERSION='v1.11.0+d4cacc0' ETCD_GIT_VERSION='v3.2.16-0-g121edf0' ETCD_GIT_COMMIT='121edf0' OS_BUILD_RELEASE_ARCHIVES=n make build WHAT=vendor/github.com/onsi/ginkgo/ginkgo

Generate man pages

OS_GIT_COMMIT='c0fc512' OS_GIT_TREE_STATE='clean' OS_GIT_VERSION='v3.11.0+c0fc512-459' OS_GIT_MAJOR='3' OS_GIT_MINOR='11+' OS_GIT_PATCH='0' KUBE_GIT_MAJOR='1' KUBE_GIT_MINOR='11+' KUBE_GIT_COMMIT='d4cacc0' KUBE_GIT_VERSION='v1.11.0+d4cacc0' ETCD_GIT_VERSION='v3.2.16-0-g121edf0' ETCD_GIT_COMMIT='121edf0' make build-docs

exit 0

Any idea about what's happening?

Had to install goversioninfo package (yum install goversioninfo) using Origin311 repository.

Also, had to change GOCACHE environment variable because it was saving files in /root/ directory and i didn't have sufficient disk space.

openshift-bot commented 4 years ago

Issues go stale after 90d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle stale. Stale issues rot after an additional 30d of inactivity and eventually close. Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle stale

openshift-bot commented 3 years ago

Stale issues rot after 30d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle rotten. Rotten issues close after an additional 30d of inactivity. Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle rotten /remove-lifecycle stale

openshift-bot commented 3 years ago

Rotten issues close after 30d of inactivity.

Reopen the issue by commenting /reopen. Mark the issue as fresh by commenting /remove-lifecycle rotten. Exclude this issue from closing again by commenting /lifecycle frozen.

/close

openshift-ci-robot commented 3 years ago

@openshift-bot: Closing this issue.

In response to [this](https://github.com/openshift/origin/issues/25000#issuecomment-766251854): >Rotten issues close after 30d of inactivity. > >Reopen the issue by commenting `/reopen`. >Mark the issue as fresh by commenting `/remove-lifecycle rotten`. >Exclude this issue from closing again by commenting `/lifecycle frozen`. > >/close Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes/test-infra](https://github.com/kubernetes/test-infra/issues/new?title=Prow%20issue:) repository.