smart-edge-open / converged-edge-experience-kits

Source code for experience kits with Ansible-based deployment.
Apache License 2.0
37 stars 40 forks source link

Edge Node Deployment with ovs-dpdk failing #17

Closed higginse-id closed 4 years ago

higginse-id commented 4 years ago

I have been attempting to deploy a minimal network edge deployment, using the openness experience kit. I have modified it to use ovncni rather than nts I have also disabled any customisations for the controller and edge nodes (e.g. don't want a real time kernel etc.). I just want to verify end-to-end connectivity from my core network through an edge node to an edge client behind it.

Edge Node deployment fails with the following error Error building ovs-dpdk - code: None, message: COPY failed: stat /var/lib/docker/tmp/docker-builder687389694/ovs-healthcheck.sh: no such file or directory

In attempting to debug the ovs docker issue I have tried to build the docker image directly/manually on the target edge node (I find the ansible logs nearly impossible to read, let alone debug)

It also gave the same outcome:

[root@mec-n86 dpdk-18.11.2]# cd /opt/dpdk-18-112.2
[root@mec-n86 dpdk-18.11.2]# docker build -f Dockerfile.dpdk -t ovs-dpdk .
        :                                :                                   :
Step 14/20 : RUN rpm -ivh ~/ovs-${OVS_VERSION}-${OVS_SUBVERSION}/rpm/rpmbuild/RPMS/x86_64/openvswitch-${OVS_VERSION}-${OVS_SUBVERSION}.el7.x86_64.rpm &&     rpm -ivh ~/ovs-${OVS_VERSION}-${OVS_SUBVERSION}/rpm/rpmbuild/RPMS/x86_64/openvswitch-devel-${OVS_VERSION}-${OVS_SUBVERSION}.el7.x86_64.rpm &&     rpm -ivh https://github.com/alauda/ovs/releases/download/${OVS_VERSION}-${OVS_SUBVERSION}/ovn-${OVS_VERSION}-${OVS_SUBVERSION}.el7.x86_64.rpm &&     rpm -ivh https://github.com/alauda/ovs/releases/download/${OVS_VERSION}-${OVS_SUBVERSION}/ovn-vtep-${OVS_VERSION}-${OVS_SUBVERSION}.el7.x86_64.rpm &&     rpm -ivh https://github.com/alauda/ovs/releases/download/${OVS_VERSION}-${OVS_SUBVERSION}/ovn-central-${OVS_VERSION}-${OVS_SUBVERSION}.el7.x86_64.rpm &&     rpm -ivh https://github.com/alauda/ovs/releases/download/${OVS_VERSION}-${OVS_SUBVERSION}/ovn-host-${OVS_VERSION}-${OVS_SUBVERSION}.el7.x86_64.rpm
---> Running in f9d673a8a752
Preparing...                          ########################################
Updating / installing...
openvswitch-2.12.0-4.el7              ########################################
Preparing...                          ########################################
Updating / installing...
openvswitch-devel-2.12.0-4.el7        ########################################
Retrieving https://github.com/alauda/ovs/releases/download/2.12.0-4/ovn-2.12.0-4.el7.x86_64.rpm
Preparing...                          ########################################
Updating / installing...
ovn-2.12.0-4.el7                      ########################################
Retrieving https://github.com/alauda/ovs/releases/download/2.12.0-4/ovn-vtep-2.12.0-4.el7.x86_64.rpm
Preparing...                          ########################################
Failed to get D-Bus connection: Operation not permitted
Updating / installing...
ovn-vtep-2.12.0-4.el7                 ########################################
Retrieving https://github.com/alauda/ovs/releases/download/2.12.0-4/ovn-central-2.12.0-4.el7.x86_64.rpm
Preparing...                          ########################################
Updating / installing...
ovn-central-2.12.0-4.el7              ###############Failed to get D-Bus connection: Operation not permitted
#########################
Retrieving https://github.com/alauda/ovs/releases/download/2.12.0-4/ovn-host-2.12.0-4.el7.x86_64.rpm
Preparing...                          ########################################
Failed to get D-Bus connection: Operation not permitted
Updating / installing...
ovn-host-2.12.0-4.el7                 ########################################
Removing intermediate container f9d673a8a752
---> dd56bd92b4cf
Step 15/20 : RUN mkdir -p /var/run/openvswitch &&     mkdir -p /etc/cni/net.d &&     mkdir -p /opt/cni/bin
---> Running in 5b9f42f93e73
Removing intermediate container 5b9f42f93e73
---> f3b6f97bbf66
Step 16/20 : COPY ovs-healthcheck.sh /root/ovs-healthcheck.sh
COPY failed: stat /var/lib/docker/tmp/docker-builder826022717/ovs-healthcheck.sh: no such file or directory
[root@mec-n86 dpdk-18.11.2]#

From digging a little deeper it now seems that it’s probable that the .dockerignore file is misconfigured. As a crude workaround I manually modified it and added entries for the two files causing issues (the wildcard entry masks two required files):

# Add everything to the ignored
*
# Add following to whitelist:
!lib
!drivers
!x86_64-native-linuxapp-gcc
!configure_ovn_net.sh
!start_ovs_ovn.sh
#2020_05_25 higginse debugging build fail
!ovs-healthcheck.sh
!start-ovs-dpdk.sh

with these changes, the manual build succeeds. Then to 'patch' the experience kit config temporarily: 1) On the openness experience kit host I modified roles/kubernetes/cni/kubeovn/common/defaults/main.yml to also expect a (local) .dockerignore file

------------8<---    roles/kubernetes/cni/kubeovn/common/defaults/main.yml   ----8<----
kubeovn_download_files:
- "{{ kubeovn_raw_file_repo }}/{{ kubeovn_version }}/dist/images/Dockerfile.node"
- "{{ kubeovn_raw_file_repo }}/{{ kubeovn_version }}/dist/images/start-ovs.sh"
- "{{ kubeovn_raw_file_repo }}/{{ kubeovn_version }}/dist/images/ovs-healthcheck.sh"
- file:///opt/openness/ehiggins/.dockerignore

kubeovn_dockerimage_files_to_cp:
- Dockerfile.dpdk
- start-ovs-dpdk.sh
- ovs-healthcheck.sh
- .dockerignore
------------8<-------------------------------------------------------------------8<----

2) Next, I manually modified the .dockerignore as previously mentioned to exclude ovs-healthcheck.sh and start-ovs-dpdk.sh 3) I (again manually) copied the modified .dockerignore file to my target edge host (at the path I chose above - /opt/openness/ehiggins/) 4) Finally I re-ran the deploy_ne.sh script with (nodes argument).

This time the script ran to completion.

amr-mokhtar commented 4 years ago

Hi @higginse-id, I checked this with engineering. It seems that you are using the wrong deploy script. You should use ./deploy_onprem.sh.

The .dockerignore file exists at https://github.com/open-ness/openness-experience-kits/blob/master/roles/openness/onprem/dataplane/ovncni/common/files/.dockerignore

higginse-id commented 4 years ago

HI. I don't follow. I was attempting a Network Edge deployment - it makes no sense to use the on premise playbook for a network edge deployment.

Furthermore, as far as I know I have been following the deployment instructions exactly.

I have been trying to deploy the simplest possible configuration, with a single controller and single edge node.

Can you please clarify?

i-kwilk commented 4 years ago

Hi,

We need to know that there are two deployment modes: OnPrem and Network Edge. Please take a look to: https://github.com/open-ness/specs/blob/master/doc/architecture.md#deployment-scenarios

From our understanding, there is a change made in group_vars/all.yml

# Dataplane to be used for On-Premises mode
# Available dataplanes:
# - nts
# - ovncni
onprem_dataplane: "ovncni"

Which is wrong. Those settings are designed to be used in OnPrem mode.

If we want to change CNI in Network Edge we need to look to kubernetes_cnis variable :

kubernetes_cnis:
- kubeovn

where kubeovn is enabled by default.

Could you send all your changes made in the repo?

amr-mokhtar commented 4 years ago

Closing issue due to inactivity..