Closed sony87 closed 3 months ago
can you please add the farm id or the node id that you tried to deploy on?
check 4985 and 2594, couldn't deploy on both
it kept giving Waiting for deployment with contract_id: 236293 to be ready
and Waiting for deployment with contract_id: 236290 to be ready
Nodes: 4349, 4350, on Farm "Mango Farm" most of the nodes does not work, 2595, 2596, 2636 etc....
The problem might be caused by latency to the hub. This in turn could cause the deployment to time out while it's fetching data from the hub (probably when copying a disk image from 0-fs to the local disk). If this is indeed the problem, it can be verified as follows:
I'm assuming the disk copy keeps running after the deployment time-out. If that is not the case, you'll have to redeploy a couple of times possibly, until the disk image is in the 0-fs cache completely.
If this is indeed the case, then there either needs to be a workaround in zos or the actual solution is to make sure that the hub is present in multiple geographic regions so latency is consistently low (distributed hub or some kind of cdn thing).
[+] flistd: 2024-02-27T09:25:36Z info flist already in on the filesystem url=https://hub.grid.tf/tf-official-vms/ubuntu-22.04.flist
2024-02-27 14:34:00 | [+] flistd: 2024-02-27T13:34:00Z info request to mount flist: {ReadOnly:true Limit:0 Storage: PersistedVolume:} name=cloud-container:c65ef166512f3d5fe7c61fc3d8dd3c89 storage= url=https://hub.grid.tf/tf-autobuilder/cloud-container-8730b6f.flist
-- | --
| | 2024-02-27 14:33:57 | [+] identityd: 2024-02-27T13:33:57Z info checking for update after milliseconds wait=4440000
| | 2024-02-27 14:33:57 | [+] identityd: 2024-02-27T13:33:57Z info checking if update is required current=3.9.0 latest=3.9.0
| | 2024-02-27 14:33:56 | [+] flistd: 2024-02-27T13:33:56Z info starting g8ufs daemon args=["--cache","/var/cache/modules/flistd/cache","--meta","/var/cache/modules/flistd/flist/fa05b43ad1c5362453cb70de7cea9664","--daemon","--log","/var/cache/modules/flistd/log/fa05b43ad1c5362453cb70de7cea9664.log"] storage= url=https://hub.grid.tf/tf-official-vms/nixos-22.11.flist
| | 2024-02-27 14:33:54 | [+] flistd: 2024-02-27T13:33:54Z info request to mount flist storage= url=https://hub.grid.tf/tf-official-vms/nixos-22.11.flist
| | 2024-02-27 14:33:54 | [+] flistd: 2024-02-27T13:33:54Z info request to mount flist: {ReadOnly:true Limit:0 Storage: PersistedVolume:} name=604-240316-thebatcavetest2 storage= url=https://hub.grid.tf/tf-official-vms/nixos-22.11.flist
<img width="1318" alt="image" src="https://github.com/threefoldtech/test_feedback/assets/13523434/190e3e58-bc44-4e21-9370-273d95cc3247">
- Node Network Traffic was at it's peake and getting higher each minute as you can see from these 2 screenshots :
<img width="659" alt="image" src="https://github.com/threefoldtech/test_feedback/assets/13523434/d065441c-a6fb-4cf9-af89-ccbf0ce279e9">
<img width="710" alt="image" src="https://github.com/threefoldtech/test_feedback/assets/13523434/07224d90-2785-431c-bedc-b2f27e6548fd">
- From Dashboard, first it was waiting for vm to be ready
Waiting for deployment with contract_id: 240316 to be ready
- Then got this error.
Failed to send request to twinId 7688 with command: zos.deployment.get, payload: {"contract_id":240316} Didn't get a response after 20 seconds
- Then Contracts got Cancled
- ZOS logs :
<img width="1135" alt="image" src="https://github.com/threefoldtech/test_feedback/assets/13523434/bf18687f-c1a0-4c49-b452-a93c1ad3f52e">
- Network Traffic still getting higher :
<img width="701" alt="image" src="https://github.com/threefoldtech/test_feedback/assets/13523434/249983a6-ddc3-4962-927b-e334013babcb">
- Tried deploying nixos again, after network traffic decreased
- ZOS logs :
[+] flistd: 2024-02-27T14:05:57Z info flist already in on the filesystem url=https://hub.grid.tf/tf-official-vms/nixos-22.11.flist
- Network Traffic :
<img width="697" alt="image" src="https://github.com/threefoldtech/test_feedback/assets/13523434/fd505eee-9c59-48b6-a557-5736d313726c">
- VM was deployed successfully
<img width="769" alt="image" src="https://github.com/threefoldtech/test_feedback/assets/13523434/295e3fc8-623d-4ba1-aba0-f9b5ada3e733">
- Did a quick speed test on the vm
root@thebatcavetest:~# speedtest-cli Retrieving speedtest.net configuration... Testing from Aussie Broadband (159.196.171.188)... Retrieving speedtest.net server list... Selecting best server based on ping... Hosted by Superloop Australia Pty Ltd (Sydney) [0.09 km]: 16.961 ms Testing download speed................................................................................ Download: 269.32 Mbit/s Testing upload speed...................................................................................................... Upload: 23.58 Mbit/s
@sabrinasadik Confirmed flist download from the hub takes long time, which cause a timeout on dashboard side then cancelling the contracts, but downloading the flist continues and deploying it again works after download is done.
So what you are saying is that i need to stay and re-deploying on the same machine untill it comples ?
Until we have a workaround or fix the issue, yes. @xmonader let's discuss further to have a solution for this.
What happened?
I'm in Europe and can deploy VMs in Europe farms without issue. If i try to deploy to any Australia farm/node it fails after 10 minutes, constantly.
What did you expect?
To be able to deploy everywhere despite my location.
What browsers are you seeing the problem on?
No response
ZOS info
No response
Dashboard info
No response
weblets info
No response
Relevant log output
No response