Closed SamCullen-Element closed 3 months ago
Hi @SamCullen-Element,
It seems the proxy docker container (certification-tool-proxy-1) is trying to use an unavailable port. That's a bad stage, maybe something related with the version update.
I see that you already tried to prune, but let me ask you to try again the overreacted way (forcing everything):
$ cd ~/certification-tool
$ ./scripts/stop.sh
$ docker network prune --force
$ docker system prune --all --force
$ git fetch --all
$ git checkout v2.10+spring2024
$ git pull
$ git submodule update --init --recursive
$ ./scripts/ubuntu/auto-update.sh v2.10+spring2024
$ ./scripts/start.sh
Let me know the results, and in case of failure, let me go ahead and ask you some information:
$ docker ps
$ docker images
$ ./scripts/th/th-doctor.sh --complete
Those will help us to grasp better your environment.
@antonio-amjr this also failed. I have attached the requested logs. Docker.txt Log.txt TH-Doctor.txt
@SamCullen-Element I noticed that even after the network prune the message ! Network chip-default Resource is still in use
still appears during the build.
That's so weird that this connection survived. Maybe is faster to flash the SD card again, if you're in hurry.
Otherwise, we could investigate further. Starting with these docker commands:
$ docker network ls
$ docker network inspect <chip-default ID>
You could try as well some of the below docker network commands and try the TH build again:
$ docker network disconnect --force <chip-default ID> <container id>
$ docker network rm <network id>
Network inspection.txt @antonio-amjr please find attached the network inspection. I will follow the second set of instruction to see if that rectifies this.
Following the 2nd set of instructions from your message I deleted the 2 containers under the chip-default network which then was able to close. After running the update again the bind still fails once ./scripts/start.sh is run.
here is a text output: ubuntu@ubuntu:~/certification-tool$ ./scripts/start.sh [+] Running 5/6 ✔ Network certification-tool_traefik-public Created 0.3s ✔ Network chip-default Created 0.2s ✔ Container certification-tool-db-1 Started 3.0s ⠴ Container certification-tool-proxy-1 Starting 3.0s ✔ Container certification-tool-frontend-1 Started 3.0s ✔ Container certification-tool-backend-1 Created 0.4s Error response from daemon: driver failed programming external connectivity on endpoint certification-tool-proxy-1 (742d535def6a20bb3981257f1c83adc9b32b7bc77cbc6559bcbbd7524f8b7c8c): Bind for 0.0.0.0:8090 failed: port is already allocated ubuntu@ubuntu:~/certification-tool$
Just to be sure, which 2 containers did you delete?
From your "Network inspection" text file I saw 4 containers under chip-default
.
Other thing I noticed from the docker network ls
command is that the old repository chip-certification-tool_traefik-public network is still present.
Can you try to remove that just to make sure? For perspective, my functional environment here has these networks below (Virtual machine in my case):
ubuntu@matter-vm:~/certification-tool$ docker network ls
NETWORK ID NAME DRIVER SCOPE
d01cee45e8d5 bridge bridge local
fc16053817a2 certification-tool_traefik-public bridge local
6f8ecebd2055 chip-default bridge local
df56230e724f host host local
d2a23d37925e none null local
Hello @antonio-amjr, I deleted all the containers under chip-default which still had issues. But after removing all the containers under "certification-tool_traefik-public" the update looks to have worked and I now have the 2.10 spring release version listed on the GUI with no binding error message.
Out of curiosity to your comment, when running the ./scripts/start.sh command it does start a new instance of certification-tool_traefik-public if you were not expecting this to still be part of the tool.
Hi @SamCullen-Element,
Awesome that everything worked out. Should we close this issue up? Let me know if your need more help.
By the way, just to clear out what I meant: I was talking about your state that came in your Network inspection.txt file. In your list you had the following:
ubuntu@ubuntu:~$ docker network ls
NETWORK ID NAME DRIVER SCOPE
2dc1c2b71876 bridge bridge local
983a0f0ae288 certification-tool_traefik-public bridge local
af0fb364d673 chip-certification-tool_traefik-public bridge local
49d2b10c6a3f chip-default bridge local
8e40e3f1f9fa host host local
0e23dbf5119c none null local
Note that above we have both certification-tool_traefik-public
and chip-certification-tool_traefik-public
at the same time in your environment. I thought that this duplication was the problem.
So, in your case, you removed the containers in the former but the later still exists after everything worked? That's a surprise to be honest. I'll see if I can improve something in the scripts and keep an eye for similar problems showing up.
@antonio-amjr I got another SD card flashed with the fall-th release and followed the steps as given to upgrade to v2.10+spring2024 without manually destroying the containers. The issue seems to be inconsistent over the attempts. I have attached logs for your tracking and consideration. Network inspect.txt PS and Images.txt SSH session.txt TH-Doctor.txt
I see @SamCullen-Element. In that case I'll try to reproduce locally this release update problem to solve this accordingly. Whether with a script update or a better set of commands.
Let me get back to you afterwards.
@SamCullen-Element, I realized something by doing the process, let me see if we are in the same page.
So, you're flashing the TH-fall2023 release (that one was from the closed-source repo) and trying to update to the th-spring2024 version (this one an open-source repo already), right? Thus, since it's not possible to use the closed-source repo to checkout to spring2024, you're cloning the new one as I could see in the SSH session.txt file shared.
To confirm we're in the same page, after cloning the new repo in the process, there will be two repositories, one with prefix chip-
, and the other not, like:
ubuntu@ubuntu:~/certification-tool$ ls ~
apps/ certification-tool/ chip-certification-tool/
All I did to make the update work was to make sure to stop the containers from the old repo (~/chip-certification-tool/scripts/stop.sh
) before doing all the steps you did in the new repo (certification-tool
).
If that is the case, try the same if possible.
If I got something wrong, please walk me through the whole process you did.
@antonio-amjr That would be correct yes, the TH-fall2023 is the last image I have for the test harness as I could not get the command line process to work for updating pre-TH_Fall2023 either but flashing the image worked fine.
In this scenario I am using the TH-Fall2023 image as my starting point and following the instructions which bring me onto the Spring2024 branch and pulls what I needed. Indeed I do/did see those two repositories as you have indicated, one with the prefix "chip-" and one without.
I see @SamCullen-Element,
The recommendation now is to follow the new process for newer releases, since we are not distributing TH as an SD-Card images anymore. You may read in more details in the User Guide, but basically is to flash a Ubuntu Server 22.04.4 LTS version (using the Raspberry Pi Imager or similar app) and then:
- $ cd ~
- $ git clone -b v2.10+spring2024 https://github.com/project-chip/certification-tool.git
- $ cd certification-tool
- $ git submodule update --init --recursive
- $ ./scripts/pi-setup/auto-install.sh
This newer process above is more guaranteed, but you may try to update from fall2023 using the tip I gave in the last message by running the ~/chip-certification-tool/scripts/stop.sh
before updating.
Let me know the results.
Hey @SamCullen-Element
Did you manage to flash Ubuntu directly and clone the v2.10+spring2024
release directly?
Or have you tried the update by stoping first the chip-
repository?
Let me know if we may close this up. Thanks
Hello @antonio-amjr,
Apologies for the delay, I was out of office for a while.
I was able to update multiple test harnesses experiencing the same issue by stopping the first chip- repository, it seemed to sort them all out thank you.
Best Regards, Sam
That's great Sam. Glad that it worked out. I'll close this issue then. Feel free to open another if you have more trouble.
Best regards, Antonio Jr.
Describe the bug
When following the instructions to update the matter TH the message following when I run the auto-update.sh script: "WARN[0000] The "BACKEND_FILEPATH_ON_HOST" variable is not set. Defaulting to a blank string. WARN[0000] /home/ubuntu/certification-tool/docker-compose.yml: 'version' is obsollete". Once the update process is completed and the tool started, the UI does not display a version or SHA number and there is an error pop up stating an error has been encountered. When running the ./start.sh script after an attempted update I get the message "Error response from daemon: driver failed programming external connectivity on endpoint certification-tool-proxy-1: Bind for 0.0.0.0:8090 failed: port is already allocated
Steps to reproduce the behavior
Following instructions as presented in the User Guide, also attempted with "docker network prune" and "docker system prune" commands as suggested by Hilton Lima.
Expected behavior
The test harness to successfully update without error.
Log files
No response
PICS file
No response
Screenshots
No response
Environment
No response
Additional Information
No response