smart-edge-open / converged-edge-experience-kits

Source code for experience kits with Ansible-based deployment.
Apache License 2.0
37 stars 40 forks source link

Deploy onprem controller on aws EC2 Centos 7 OS #62

Closed MohamedSherifAbdelsamiea closed 3 years ago

MohamedSherifAbdelsamiea commented 4 years ago

Hi,

I am using aws EC2 instance with Centos 7 OS to try the solution. However, I am facing the below error while running command ./deploy_onprem.sh controller

TASK [docker : install dependencies] *** task path: /root/openness-experience-kits/roles/docker/tasks/main.yml:36 fatal: [controller]: FAILED! => { "changed": false, "cmd": [ "/usr/bin/pip2", "install", "-r", "/tmp/requirements.txt" ] }

MSG:

stdout: Requirement already satisfied: backports.ssl-match-hostname==3.5.0.1 in /usr/lib/python2.7/site-packages (from -r /tmp/requirements.txt (line 4)) (3.5.0.1) Collecting bcrypt==3.1.7 Using cached bcrypt-3.1.7-cp27-cp27mu-manylinux1_x86_64.whl (59 kB) Collecting cached-property==1.5.1 Using cached cached_property-1.5.1-py2.py3-none-any.whl (6.0 kB) Collecting certifi==2020.6.20 Using cached certifi-2020.6.20-py2.py3-none-any.whl (156 kB) Collecting cffi==1.14.2 Using cached cffi-1.14.2-cp27-cp27mu-manylinux1_x86_64.whl (388 kB) Collecting chardet==3.0.4 Using cached chardet-3.0.4-py2.py3-none-any.whl (133 kB) Requirement already satisfied: configobj==4.7.2 in /usr/lib/python2.7/site-packages (from -r /tmp/requirements.txt (line 10)) (4.7.2) Collecting cryptography==3.0 Using cached cryptography-3.0-cp27-cp27mu-manylinux2010_x86_64.whl (2.7 MB) Requirement already satisfied: decorator==3.4.0 in /usr/lib/python2.7/site-packages (from -r /tmp/requirements.txt (line 12)) (3.4.0) Collecting docker==3.7.3 Using cached docker-3.7.3-py2.py3-none-any.whl (134 kB) Collecting docker-compose==1.24.1 Using cached docker_compose-1.24.1-py2.py3-none-any.whl (134 kB) Collecting docker-pycreds==0.4.0 Using cached docker_pycreds-0.4.0-py2.py3-none-any.whl (9.0 kB) Collecting dockerpty==0.4.1 Using cached dockerpty-0.4.1.tar.gz (13 kB) Collecting docopt==0.6.2 Using cached docopt-0.6.2.tar.gz (25 kB) Collecting enum34==1.1.10 Using cached enum34-1.1.10-py2-none-any.whl (11 kB) Collecting functools32==3.2.3.post2 Using cached functools32-3.2.3-2.tar.gz (31 kB) Collecting idna==2.7 Using cached idna-2.7-py2.py3-none-any.whl (58 kB) Requirement already satisfied: iniparse==0.4 in /usr/lib/python2.7/site-packages (from -r /tmp/requirements.txt (line 21)) (0.4) Requirement already satisfied: ipaddress==1.0.16 in /usr/lib/python2.7/site-packages (from -r /tmp/requirements.txt (line 22)) (1.0.16) Collecting jsonschema==2.6.0 Using cached jsonschema-2.6.0-py2.py3-none-any.whl (39 kB) Collecting paramiko==2.7.1 Using cached paramiko-2.7.1-py2.py3-none-any.whl (206 kB) Requirement already satisfied: perf==0.1 in /usr/lib64/python2.7/site-packages (from -r /tmp/requirements.txt (line 25)) (0.1) Collecting pycparser==2.20 Using cached pycparser-2.20-py2.py3-none-any.whl (112 kB) Requirement already satisfied: pycurl==7.19.0 in /usr/lib64/python2.7/site-packages (from -r /tmp/requirements.txt (line 27)) (7.19.0) Requirement already satisfied: pygobject==3.22.0 in /usr/lib64/python2.7/site-packages (from -r /tmp/requirements.txt (line 28)) (3.22.0) Requirement already satisfied: pygpgme==0.3 in /usr/lib64/python2.7/site-packages (from -r /tmp/requirements.txt (line 29)) (0.3) Requirement already satisfied: pyliblzma==0.5.3 in /usr/lib64/python2.7/site-packages (from -r /tmp/requirements.txt (line 30)) (0.5.3) Collecting PyNaCl==1.4.0 Using cached PyNaCl-1.4.0-cp27-cp27mu-manylinux1_x86_64.whl (964 kB) Requirement already satisfied: python-linux-procfs==0.4.9 in /usr/lib/python2.7/site-packages (from -r /tmp/requirements.txt (line 32)) (0.4.9) Requirement already satisfied: pyudev==0.15 in /usr/lib/python2.7/site-packages (from -r /tmp/requirements.txt (line 33)) (0.15) Requirement already satisfied: pyxattr==0.5.1 in /usr/lib64/python2.7/site-packages (from -r /tmp/requirements.txt (line 34)) (0.5.1) Collecting PyYAML==3.13 Using cached PyYAML-3.13.tar.gz (270 kB) Collecting requests==2.20.1 Using cached requests-2.20.1-py2.py3-none-any.whl (57 kB) Requirement already satisfied: schedutils==0.4 in /usr/lib64/python2.7/site-packages (from -r /tmp/requirements.txt (line 37)) (0.4) Collecting six==1.15.0 Using cached six-1.15.0-py2.py3-none-any.whl (10 kB)

:stderr: DEPRECATION: Python 2.7 reached the end of its life on January 1st, 2020. Please upgrade your Python as Python 2.7 is no longer maintained. pip 21.0 will drop support for Python 2.7 in January 2021. More details about Python 2 support in pip can be found at https://pip.pypa.io/en/latest/development/release-process/#python-2-support pip 21.0 will remove support for this functionality. ERROR: Could not find a version that satisfies the requirement slip==0.4.0 (from -r /tmp/requirements.txt (line 39)) (from versions: 0.1, 0.2, 0.3.8, 20191113) ERROR: No matching distribution found for slip==0.4.0 (from -r /tmp/requirements.txt (line 39))

PLAY RECAP ***** controller : ok=61 changed=12 unreachable=0 failed=1 skipped=21 rescued=0 ignored=1

[root@ip-172-31-47-176 openness-experience-kits]# ERROR: No matching distribution found for slip==0.4.0 (from -r /tmp/requirements.txt (line 39))

Regards, Mohamed Sherif


This was sent to amr.mokhtar@intel.com because you are subscribed to Developer@mail.openness.org. Copyright 2020 OpenNESS | https://www.openness.org Contact: https://www.openness.org/contact-us Privacy: https://www.intel.com/content/www/us/en/privacy/intel-privacy-notice.html Cookies: https://www.intel.com/content/www/us/en/privacy/intel-cookie-notice.html Unsubscribe: https://mail.openness.org/mailman/options/developer/amr.mokhtar%40intel.com

effndc commented 4 years ago

@MohamedSherifAbdelsamiea what specific release of CentOS are you using, you can check with cat /etc/redhat-release.

Keep in mind that OpenNESS only supports CentOS 7.6.1810 at this time, no other versions or distributions are supported.

MohamedSherifAbdelsamiea commented 4 years ago

Thanks a lot Russel , indeed Centos OS version was different but I recall CentOS 7.6.1810 has end of life on 2024, is there any roadmap to support newer versions? I am another question please, can I install edge node on amd platform? (also what is the minimum H/W requirement for Controller/Node as when I attempted to install controller with 1G RAM, it failed but with 4G RAM it succeeded)

effndc commented 4 years ago

@MohamedSherifAbdelsamiea we have tested deployments for development on virtual machines as small as the Azure B2s, which is 2 vCPU and 4GB RAM. Much of it depends on the OEK flavor, any apps you expect to run. Fewer than 4 vCPU will fail for the Edge node with the default values for tuned.

Later versions of CentOS are under consideration currently, we do not have any commitments at this time.

MohamedSherifAbdelsamiea commented 4 years ago

That's much helpful Russel! so, can I run Edge node on raspberry pi? As I mentioned earlier, I am using aws EC2 for testing the solution, I tried different instance types all the way up to t2.xlarge which has 4 vCPU and 16GB RAM but I still getting the below error where instance never come back online

TASK [machine_setup/conditional_reboot : reboot the machine] ** task path: /root/openness-experience-kits/roles/machine_setup/conditional_reboot/tasks/main.yml:11 fatal: [node01]: FAILED! => { "changed": false, "elapsed": 602, "rebooted": true }

MSG:

Timed out waiting for last boot time check (timeout=600)

PLAY RECAP **** node01 : ok=90 changed=32 unreachable=0 failed=1 skipped=29 rescued=0 ignored=2

effndc commented 4 years ago

One more addendum, some of the telemetry services will likely fail to start without adequate memory on the nodes. Depending on pod start order on reboots, having too little RAM may cause other pods to fail to start.

In order to use cloud instances you need to disable any of the kernel changes, the default configuration is going to build a real-time kernel that does not have the custom kernel modules required to boot on the cloud provider hypervisors. If you do not get these flags correct the VM will fail to boot when rebooted to apply the new kernel and kernel boot flags.

Within group_vars/controller_group/10-default.yml and group_vars/edge_group/10-default.yml you need to set kernel_skip: true, you also need to change the kernel_devel flag to dpdk_kernel_devel: "https://linuxsoft.cern.ch/cern/centos/7/updates/x86_64/Packages/kernel-devel-3.10.0-957.21.3.el7.x86_64.rpm".

I believe that will get you moving along, @MohamedSherifAbdelsamiea.

effndc commented 4 years ago

so, can I run Edge node on raspberry pi?

We do not support Raspberry Pi within OpenNESS, the CPU architecture is not x86 and uses an ARM CPU and isn't binary compatible.

MohamedSherifAbdelsamiea commented 4 years ago

only with changing kernel_skip: true with default 2 up nodes without interfaces 2 up nodes without interfaces 2

dpdk_kernel_devel: "http://linuxsoft.cern.ch/centos-vault/7.6.1810/os/x86_64/Packages/kernel-devel-3.10.0-957.el7.x86_64.rpm", I succeeded to make edge node up and running.

with

dpdk_kernel_devel: "https://linuxsoft.cern.ch/cern/centos/7/updates/x86_64/Packages/kernel-devel-3.10.0-957.21.3.el7.x86_64.rpm",

I am getting the below error:

TASK [dpdk : fail if kernel-devel version is not correct] ** task path: /root/openness-experience-kits/roles/dpdk/tasks/main.yml:31 fatal: [node01]: FAILED! => { "changed": false }

MSG:

kernel-devel version(https://linuxsoft.cern.ch/cern/centos/7/updates/x86_64/Packages/kernel-devel-3.10.0-957.21.3.el7.x86_64.rpm) does not match the current kernel(3.10.0-957.el7.x86_64)

PLAY RECAP ***** node01 : ok=85 changed=28 unreachable=0 failed=1 skipped=35 rescued=0 ignored=3

now I have 1 controller and 2 edge nodes up and running but unfortunately , I cant see any interfaces available. what could be the reason?

effndc commented 4 years ago

Apologies, what did you define in group_vars/edge_group/10-default.yml for lines 13? You should set that field to true, for both Controller and Edge.

Can you verify which kernel you see on your Edge node? uname -r should be adequate for output, also can confirm the CentOS release with cat /etc/redhat-release. It is possible that the kernel version that Amazon builds into their images is different than what we have tested with on Azure, Azure is currently the only cloud provider we have tested with which is what the provided dpdk_kernel_devel value was based upon.

To confirm that the K8s cluster enrollment was successful, you can check the node status from the controller node with: kubectl get nodes -o wide. The output should show both nodes and the status of Ready, if that is not the case the deployment did not complete.

MohamedSherifAbdelsamiea commented 4 years ago

I defined kernel_skip: true for both controller and node uname -r: 3.10.0-957.el7.x86_64 cat /etc/redhat-release: CentOS Linux release 7.6.1810 (Core) I have created my own Centos img on aws from raw .iso ( if you have a link to centos iso file, please share it with me) kubectl get nodes -o wide: -bash: kubectl: command not found

effndc commented 4 years ago

Apologies, I had misread this as trying to deploy the network edge (deploy_ne.sh).

MohamedSherifAbdelsamiea commented 4 years ago

I am trying to deploy onprem edge (deploy_onprem.sh) Is there an aws kernal available as dpdk_kernel_devel: "https://linuxsoft.cern.ch/cern/centos/7/updates/x86_64/Packages/kernel-devel-3.10.0-957.21.3.el7.x86_64.rpm", seems not wokring with aws

MohamedSherifAbdelsamiea commented 4 years ago

TASK [dpdk : fail if kernel-devel version is not correct] ** task path: /root/openness-experience-kits/roles/dpdk/tasks/main.yml:31 fatal: [node01]: FAILED! => { "changed": false }

MSG:

kernel-devel version(https://linuxsoft.cern.ch/cern/centos/7/updates/x86_64/Packages/kernel-devel-3.10.0-957.21.3.el7.x86_64.rpm) does not match the current kernel(3.10.0-957.el7.x86_64)

PLAY RECAP ***** node01 : ok=85 changed=28 unreachable=0 failed=1 skipped=35 rescued=0 ignored=3

MohamedSherifAbdelsamiea commented 4 years ago

Solved by upgrading the OS kernal to match the new kernal devel version PLAY RECAP ***** controller : ok=130 changed=68 unreachable=0 failed=0 skipped=28 rescued=0 ignored=2 node01 : ok=266 changed=121 unreachable=0 failed=0 skipped=90 rescued=0 ignored=4

despite these results, still the command kubectl get nodes -o wide is returning -bash: kubectl: command not found and no Interfaces available at the controller page

I dont know where else is the issue?

MohamedSherifAbdelsamiea commented 4 years ago

Hi Russel,

from the deploy scripts output/logs attached, I can't see any installation activity for Kubernetes command-line tool? which explain why kubectl command is not found and consequently K8 cluster is not created.

please advise

helper_node_onprem_sh.log

effndc commented 4 years ago

Apologies for the misleading request, Kubernetes is not deployed in the "On Prem" deployment so kubectl is absent. You would find the services running within docker ps executed from each controller and node. https://www.openness.org/docs/doc/getting-started/on-premises/controller-edge-node-setup

Is the current failure that the interfaces do not show up for the node(s)? This could be due to lack of support for virtual machines in the on-prem deployment, I am not as familiar with the OnPrem controller or how it looks for interfaces.

MohamedSherifAbdelsamiea commented 4 years ago

I noticed deploy_onprem.sh been removed from github path https://github.com/open-ness/openness-experience-kits.git

please advise

effndc commented 4 years ago

This is addressed in the release notes for OpenNESS 20.09: https://github.com/open-ness/specs/blob/master/openness_releasenotes.md

Native On-premises mode
* Following from the previous release decision of pausing Native on-premises Development the code has been move to a dedicated repository “native-on-prem”
* Kubernetes based solution will now support both Network and on-premises Edge
MohamedSherifAbdelsamiea commented 4 years ago

Is that mean, now I should use deploy_ne.sh for on perm?

effndc commented 3 years ago

@MohamedSherifAbdelsamiea without knowing any details of what you are attempting to do it wouldn't be possible to guide you. I would suggest consulting our documentation to determine which deployment model services your requirements.

jakubrym commented 3 years ago

Hi @MohamedSherifAbdelsamiea, does the problem still appear? If not, do you agree to close this ticket, and reopen it if need be?

MohamedSherifAbdelsamiea commented 3 years ago

Thanks Jakubrym, this issue has been solved, please go ahead and close it