project-chip / certification-tool

A test harness and tooling designed to simplify development, testing, and certification for devices, guided by the Connectivity Standards Alliance.
https://csa-iot.org/
Apache License 2.0
38 stars 22 forks source link

[Bug] Test harness issue updating to v2.10+spring204 #269

Closed SamCullen-Element closed 3 months ago

SamCullen-Element commented 4 months ago

Describe the bug

When following the instructions to update the matter TH the message following when I run the auto-update.sh script: "WARN[0000] The "BACKEND_FILEPATH_ON_HOST" variable is not set. Defaulting to a blank string. WARN[0000] /home/ubuntu/certification-tool/docker-compose.yml: 'version' is obsollete". Once the update process is completed and the tool started, the UI does not display a version or SHA number and there is an error pop up stating an error has been encountered. When running the ./start.sh script after an attempted update I get the message "Error response from daemon: driver failed programming external connectivity on endpoint certification-tool-proxy-1: Bind for 0.0.0.0:8090 failed: port is already allocated

Steps to reproduce the behavior

Following instructions as presented in the User Guide, also attempted with "docker network prune" and "docker system prune" commands as suggested by Hilton Lima.

Expected behavior

The test harness to successfully update without error.

Log files

No response

PICS file

No response

Screenshots

No response

Environment

No response

Additional Information

No response

antonio-amjr commented 4 months ago

Hi @SamCullen-Element,

It seems the proxy docker container (certification-tool-proxy-1) is trying to use an unavailable port. That's a bad stage, maybe something related with the version update.

I see that you already tried to prune, but let me ask you to try again the overreacted way (forcing everything):

$ cd ~/certification-tool
$ ./scripts/stop.sh
$ docker network prune --force
$ docker system prune --all --force
$ git fetch --all
$ git checkout v2.10+spring2024
$ git pull
$ git submodule update --init --recursive
$ ./scripts/ubuntu/auto-update.sh v2.10+spring2024
$ ./scripts/start.sh

Let me know the results, and in case of failure, let me go ahead and ask you some information:

Those will help us to grasp better your environment.

SamCullen-Element commented 4 months ago

@antonio-amjr this also failed. I have attached the requested logs. Docker.txt Log.txt TH-Doctor.txt

antonio-amjr commented 4 months ago

@SamCullen-Element I noticed that even after the network prune the message ! Network chip-default Resource is still in use still appears during the build.

That's so weird that this connection survived. Maybe is faster to flash the SD card again, if you're in hurry.

Otherwise, we could investigate further. Starting with these docker commands:

You could try as well some of the below docker network commands and try the TH build again:

SamCullen-Element commented 4 months ago

Network inspection.txt @antonio-amjr please find attached the network inspection. I will follow the second set of instruction to see if that rectifies this.

SamCullen-Element commented 4 months ago

Following the 2nd set of instructions from your message I deleted the 2 containers under the chip-default network which then was able to close. After running the update again the bind still fails once ./scripts/start.sh is run.

here is a text output: ubuntu@ubuntu:~/certification-tool$ ./scripts/start.sh [+] Running 5/6 ✔ Network certification-tool_traefik-public Created 0.3s ✔ Network chip-default Created 0.2s ✔ Container certification-tool-db-1 Started 3.0s ⠴ Container certification-tool-proxy-1 Starting 3.0s ✔ Container certification-tool-frontend-1 Started 3.0s ✔ Container certification-tool-backend-1 Created 0.4s Error response from daemon: driver failed programming external connectivity on endpoint certification-tool-proxy-1 (742d535def6a20bb3981257f1c83adc9b32b7bc77cbc6559bcbbd7524f8b7c8c): Bind for 0.0.0.0:8090 failed: port is already allocated ubuntu@ubuntu:~/certification-tool$

antonio-amjr commented 4 months ago

Just to be sure, which 2 containers did you delete? From your "Network inspection" text file I saw 4 containers under chip-default.

Other thing I noticed from the docker network ls command is that the old repository chip-certification-tool_traefik-public network is still present.

Can you try to remove that just to make sure? For perspective, my functional environment here has these networks below (Virtual machine in my case):

ubuntu@matter-vm:~/certification-tool$ docker network ls
NETWORK ID     NAME                                DRIVER    SCOPE
d01cee45e8d5   bridge                              bridge    local
fc16053817a2   certification-tool_traefik-public   bridge    local
6f8ecebd2055   chip-default                        bridge    local
df56230e724f   host                                host      local
d2a23d37925e   none                                null      local
SamCullen-Element commented 4 months ago

Hello @antonio-amjr, I deleted all the containers under chip-default which still had issues. But after removing all the containers under "certification-tool_traefik-public" the update looks to have worked and I now have the 2.10 spring release version listed on the GUI with no binding error message.

Out of curiosity to your comment, when running the ./scripts/start.sh command it does start a new instance of certification-tool_traefik-public if you were not expecting this to still be part of the tool. image

antonio-amjr commented 4 months ago

Hi @SamCullen-Element,

Awesome that everything worked out. Should we close this issue up? Let me know if your need more help.


By the way, just to clear out what I meant: I was talking about your state that came in your Network inspection.txt file. In your list you had the following:

ubuntu@ubuntu:~$ docker network ls
NETWORK ID     NAME                                     DRIVER    SCOPE
2dc1c2b71876   bridge                                   bridge    local
983a0f0ae288   certification-tool_traefik-public        bridge    local
af0fb364d673   chip-certification-tool_traefik-public   bridge    local
49d2b10c6a3f   chip-default                             bridge    local
8e40e3f1f9fa   host                                     host      local
0e23dbf5119c   none                                     null      local

Note that above we have both certification-tool_traefik-public and chip-certification-tool_traefik-public at the same time in your environment. I thought that this duplication was the problem.

So, in your case, you removed the containers in the former but the later still exists after everything worked? That's a surprise to be honest. I'll see if I can improve something in the scripts and keep an eye for similar problems showing up.

SamCullen-Element commented 4 months ago

@antonio-amjr I got another SD card flashed with the fall-th release and followed the steps as given to upgrade to v2.10+spring2024 without manually destroying the containers. The issue seems to be inconsistent over the attempts. I have attached logs for your tracking and consideration. Network inspect.txt PS and Images.txt SSH session.txt TH-Doctor.txt

antonio-amjr commented 3 months ago

I see @SamCullen-Element. In that case I'll try to reproduce locally this release update problem to solve this accordingly. Whether with a script update or a better set of commands.

Let me get back to you afterwards.

antonio-amjr commented 3 months ago

@SamCullen-Element, I realized something by doing the process, let me see if we are in the same page.

So, you're flashing the TH-fall2023 release (that one was from the closed-source repo) and trying to update to the th-spring2024 version (this one an open-source repo already), right? Thus, since it's not possible to use the closed-source repo to checkout to spring2024, you're cloning the new one as I could see in the SSH session.txt file shared.

To confirm we're in the same page, after cloning the new repo in the process, there will be two repositories, one with prefix chip-, and the other not, like:

ubuntu@ubuntu:~/certification-tool$ ls ~
apps/  certification-tool/  chip-certification-tool/

All I did to make the update work was to make sure to stop the containers from the old repo (~/chip-certification-tool/scripts/stop.sh) before doing all the steps you did in the new repo (certification-tool). If that is the case, try the same if possible.

If I got something wrong, please walk me through the whole process you did.

SamCullen-Element commented 3 months ago

@antonio-amjr That would be correct yes, the TH-fall2023 is the last image I have for the test harness as I could not get the command line process to work for updating pre-TH_Fall2023 either but flashing the image worked fine.

In this scenario I am using the TH-Fall2023 image as my starting point and following the instructions which bring me onto the Spring2024 branch and pulls what I needed. Indeed I do/did see those two repositories as you have indicated, one with the prefix "chip-" and one without.

antonio-amjr commented 3 months ago

I see @SamCullen-Element,

The recommendation now is to follow the new process for newer releases, since we are not distributing TH as an SD-Card images anymore. You may read in more details in the User Guide, but basically is to flash a Ubuntu Server 22.04.4 LTS version (using the Raspberry Pi Imager or similar app) and then:

- $ cd ~
- $ git clone -b v2.10+spring2024 https://github.com/project-chip/certification-tool.git
- $ cd certification-tool
- $ git submodule update --init --recursive
- $ ./scripts/pi-setup/auto-install.sh

This newer process above is more guaranteed, but you may try to update from fall2023 using the tip I gave in the last message by running the ~/chip-certification-tool/scripts/stop.sh before updating.

Let me know the results.

antonio-amjr commented 3 months ago

Hey @SamCullen-Element

Did you manage to flash Ubuntu directly and clone the v2.10+spring2024 release directly? Or have you tried the update by stoping first the chip- repository?

Let me know if we may close this up. Thanks

SamCullen-Element commented 3 months ago

Hello @antonio-amjr,

Apologies for the delay, I was out of office for a while.

I was able to update multiple test harnesses experiencing the same issue by stopping the first chip- repository, it seemed to sort them all out thank you.

Best Regards, Sam

antonio-amjr commented 3 months ago

That's great Sam. Glad that it worked out. I'll close this issue then. Feel free to open another if you have more trouble.

Best regards, Antonio Jr.