opennetworkinglab / aether-onramp

Apache License 2.0
15 stars 22 forks source link

Error when running make 5gc install #2

Open hussainahmad1995 opened 1 year ago

hussainahmad1995 commented 1 year ago

I am working using the quickstart guide on Aether website but when I run the following command : make aether-5gc-install

Get the following failure error from one of the Ansible tasks in the path/home/atlas-support/projects/aether-onramp/deps/5gc/roles/core/tasks/install.yml:

- name: deploy aether 5gc
  block:
    - name: deploy aether 5gc
      kubernetes.core.helm:
        update_repo_cache: true
        name: sd-core
        release_namespace: omec
        create_namespace: true
        chart_ref: "{{ core.helm.chart_ref }}"
        chart_version: "{{ core.helm.chart_version }}"
        values_files:
          - /tmp/sdcore-5g-values.yaml
        wait: true
        wait_timeout: "1m30s"
        force: true
      when: inventory_hostname in groups['master_nodes'] 
TASK [core : deploy aether 5gc] ****************************************************************************************************************
fatal: [node1]: FAILED! => {"changed": false, "command": "/usr/local/bin/helm --version=0.12.6 upgrade -i --reset-values --wait --timeout 1m30s --create-namespace --values=/tmp/sdcore-5g-values.yaml sd-core aether/sd-core", "msg": "Failure when executing Helm command. Exited 1.\nstdout: Release \"sd-core\" does not exist. Installing it now.\n\nstderr: coalesce.go:175: warning: skipped value for kafka.config: Not a table.\nError: client rate limiter Wait returned an error: rate: Wait(n=1) would exceed context deadline\n", "stderr": "coalesce.go:175: warning: skipped value for kafka.config: Not a table.\nError: client rate limiter Wait returned an error: rate: Wait(n=1) would exceed context deadline\n", "stderr_lines": ["coalesce.go:175: warning: skipped value for kafka.config: Not a table.", "Error: client rate limiter Wait returned an error: rate: Wait(n=1) would exceed context deadline"], "stdout": "Release \"sd-core\" does not exist. Installing it now.\n", "stdout_lines": ["Release \"sd-core\" does not exist. Installing it now."]}

The status shows one of the deployed pods crashes. Any suggestions on where are things going wrong ?

kubectl get pods -n omec

NAME READY STATUS RESTARTS AGE amf-5887bbf6c5-f9nbb 1/1 Running 0 13m ausf-6dbb7655c7-mlcgb 1/1 Running 0 11m kafka-0 1/1 Running 1 (12m ago) 13m metricfunc-b9f8c667b-9wfgq 1/1 Running 0 13m mongodb-0 1/1 Running 0 13m mongodb-1 1/1 Running 0 12m mongodb-arbiter-0 1/1 Running 0 13m nrf-54bf88c78c-tmn8f 1/1 Running 0 13m nssf-5b85b8978d-dxcrv 1/1 Running 0 13m pcf-758d7cfb48-tlflp 1/1 Running 0 13m sd-core-zookeeper-0 1/1 Running 0 13m simapp-6cccd6f787-ng7f4 1/1 Running 0 13m smf-7f89c6d849-njxvk 1/1 Running 0 13m udm-768b9987b4-ppckx 1/1 Running 0 13m udr-8566897d45-6d82h 1/1 Running 0 13m upf-0 3/5 CrashLoopBackOff 13 (15s ago) 11m webui-5894ffd49d-4bcgg 1/1 Running 0 13m

mbilal92 commented 1 year ago

It appears that one of your UPF pods is experiencing recurrent crashes.

Could you please provide the logs for the following containers within the UPF pod: "bessd," "routectl," "web," "pfcp-agent," and "arping"?

You can use the command kubectl logs upf-0 'container-name' -p -n omec to retrieve these logs.

hussainahmad1995 commented 11 months ago

Here is the log file for "bessd," "routectl," "web," "pfcp-agent," and "arping"

 kubectl logs -n omec upf-0 bessd
+ bessd -m 0 -f -grpc-url=0.0.0.0:10514
kubectl logs -n omec -p upf-0 routectl
/opt/bess/bessctl/conf/route_control.py:311: SyntaxWarning: "is" with a literal. Did you mean "=="?
  if item.prefix_len is 0:
Connecting to BESS daemon...
Error connecting to BESS daemon. Retrying in 2sec...
Error connecting to BESS daemon. Retrying in 2sec...
Error connecting to BESS daemon. Retrying in 2sec...
Error connecting to BESS daemon. Retrying in 2sec...
Error connecting to BESS daemon. Retrying in 2sec...
Traceback (most recent call last):
  File "/opt/bess/bessctl/conf/route_control.py", line 517, in <module>
    main()
  File "/opt/bess/bessctl/conf/route_control.py", line 483, in main
    connect_bessd()
  File "/opt/bess/bessctl/conf/route_control.py", line 438, in connect_bessd
    raise Exception('BESS connection failure.')
Exception: BESS connection failure.
kubectl logs -n omec -p upf-0 web
Error from server (BadRequest): previous terminated container "web" in pod "upf-0" not found
kubectl logs -n omec -p upf-0 pfcp-agent
Error from server (BadRequest): previous terminated container "pfcp-agent" in pod "upf-0" not found
kubectl logs -n omec -p upf-0 arping
Error from server (BadRequest): previous terminated container "arping" in pod "upf-0" not found
llpeterson commented 11 months ago

Digging into the UPF logs might give us a hint, but I have to believe that this is a configuration error of some kind. Since you originally modified the ran_subnet field in vars/main.yml -- and I had you do a brute force uninstall of k8s -- I wonder if there was a residual file left behind. It's a bother, but could you (1) Uninstall the whole system $ make 5gc-uninstall $ make k8s-uninstall (2) Look in /etc/systemd/network -- I believe it should be empty after the uninstall -- Let me know if its not (3) And if it is empty, reboot the server. (4) Clone a fresh version of OnRamp -- I'm not sure, but there's a chance of small fixes since you last cloned it -- Edit two instances of data_iface, plus amf.ip like we did yesterday (5) Reinstall the system $ make k8s-install $ make 5gc-install (6) See if it works this time.

CrABonzz commented 7 months ago

Hello, I get the same error, how was if fixed? The crashed pods for me are upf and mongdo Thanks

Bhuvaneshnetcon commented 7 months ago

Hi @llpeterson , I have built aether-onramp successfully and did make aether-5gc-install but my UE quectel RM-520N-GL with 3GPP release 16 does not get internet. For your kind note upf is able to ping 8.8.8.8. Then my host linux network setting is set to be as follows:

jaswanthvt commented 6 months ago

Hello,

I am facing the same error. how to fix the error? The crashed pods . kryptowire@aether:~$ kubectl get pods -n omec NAME READY STATUS RESTARTS AGE amf-5887bbf6c5-psb7m 1/1 Running 0 11h ausf-6dbb7655c7-z7jnj 1/1 Running 0 11h kafka-0 1/1 Running 1 (11h ago) 11h metricfunc-55b47f58d5-tm6s8 1/1 Running 0 11h mongodb-0 0/1 CrashLoopBackOff 137 (3m26s ago) 11h mongodb-arbiter-0 0/1 CrashLoopBackOff 137 (3m40s ago) 11h nrf-54bf88c78c-fr28b 1/1 Running 0 11h nssf-5b85b8978d-szs75 1/1 Running 0 11h pcf-758d7cfb48-jplz9 1/1 Running 0 11h sd-core-zookeeper-0 1/1 Running 0 11h simapp-6cccd6f787-2j5th 1/1 Running 0 11h smf-776ccbb869-jhv79 1/1 Running 0 11h udm-768b9987b4-p2x4z 1/1 Running 0 11h udr-8566897d45-pznns 1/1 Running 0 11h upf-0 3/5 CrashLoopBackOff 270 (13s ago) 11h webui-5894ffd49d-shvbv 1/1 Running 0 11h

Thanks