threefoldtech / test_feedback

Apache License 2.0
3 stars 0 forks source link

[Bug 🐞]: Cannot deploy VM in Australia, in Europe i can #419

Closed sony87 closed 3 months ago

sony87 commented 4 months ago

What happened?

I'm in Europe and can deploy VMs in Europe farms without issue. If i try to deploy to any Australia farm/node it fails after 10 minutes, constantly.

What did you expect?

To be able to deploy everywhere despite my location.

What browsers are you seeing the problem on?

No response

ZOS info

No response

Dashboard info

No response

weblets info

No response

Relevant log output

No response

xmonader commented 4 months ago

can you please add the farm id or the node id that you tried to deploy on?

xmonader commented 4 months ago

check 4985 and 2594, couldn't deploy on both

it kept giving Waiting for deployment with contract_id: 236293 to be ready and Waiting for deployment with contract_id: 236290 to be ready

sony87 commented 4 months ago

Nodes: 4349, 4350, on Farm "Mango Farm" most of the nodes does not work, 2595, 2596, 2636 etc....

sabrinasadik commented 4 months ago

The problem might be caused by latency to the hub. This in turn could cause the deployment to time out while it's fetching data from the hub (probably when copying a disk image from 0-fs to the local disk). If this is indeed the problem, it can be verified as follows:

I'm assuming the disk copy keeps running after the deployment time-out. If that is not the case, you'll have to redeploy a couple of times possibly, until the disk image is in the 0-fs cache completely.

If this is indeed the case, then there either needs to be a workaround in zos or the actual solution is to make sure that the hub is present in multiple geographic regions so latency is consistently low (distributed hub or some kind of cdn thing).

PeterNashaat commented 4 months ago
Deploying vm on TheBatcave farm-id 2252, node-id 4985
 [+] flistd: 2024-02-27T09:25:36Z info flist already in on the filesystem url=https://hub.grid.tf/tf-official-vms/ubuntu-22.04.flist
<img width="1318" alt="image" src="https://github.com/threefoldtech/test_feedback/assets/13523434/190e3e58-bc44-4e21-9370-273d95cc3247">

  - Node Network Traffic was at it's peake and getting higher each minute as you can see from these 2 screenshots :

<img width="659" alt="image" src="https://github.com/threefoldtech/test_feedback/assets/13523434/d065441c-a6fb-4cf9-af89-ccbf0ce279e9">
<img width="710" alt="image" src="https://github.com/threefoldtech/test_feedback/assets/13523434/07224d90-2785-431c-bedc-b2f27e6548fd">

- From Dashboard, first it was waiting for vm to be ready 

Waiting for deployment with contract_id: 240316 to be ready

   - Then got this error.

Failed to send request to twinId 7688 with command: zos.deployment.get, payload: {"contract_id":240316} Didn't get a response after 20 seconds


- Then Contracts got Cancled
   - ZOS logs : 

<img width="1135" alt="image" src="https://github.com/threefoldtech/test_feedback/assets/13523434/bf18687f-c1a0-4c49-b452-a93c1ad3f52e">
   - Network Traffic still getting higher :
<img width="701" alt="image" src="https://github.com/threefoldtech/test_feedback/assets/13523434/249983a6-ddc3-4962-927b-e334013babcb">

- Tried deploying nixos again, after network traffic decreased
   - ZOS logs :

[+] flistd: 2024-02-27T14:05:57Z info flist already in on the filesystem url=https://hub.grid.tf/tf-official-vms/nixos-22.11.flist

  - Network Traffic : 
<img width="697" alt="image" src="https://github.com/threefoldtech/test_feedback/assets/13523434/fd505eee-9c59-48b6-a557-5736d313726c">

- VM was deployed successfully 
<img width="769" alt="image" src="https://github.com/threefoldtech/test_feedback/assets/13523434/295e3fc8-623d-4ba1-aba0-f9b5ada3e733">

- Did a quick speed test on the vm 

root@thebatcavetest:~# speedtest-cli Retrieving speedtest.net configuration... Testing from Aussie Broadband (159.196.171.188)... Retrieving speedtest.net server list... Selecting best server based on ping... Hosted by Superloop Australia Pty Ltd (Sydney) [0.09 km]: 16.961 ms Testing download speed................................................................................ Download: 269.32 Mbit/s Testing upload speed...................................................................................................... Upload: 23.58 Mbit/s



@sabrinasadik Confirmed flist download from the hub takes long time, which cause a timeout on dashboard side then cancelling the contracts, but downloading the flist continues and deploying it again works after download is done.
sony87 commented 4 months ago

So what you are saying is that i need to stay and re-deploying on the same machine untill it comples ?

sabrinasadik commented 4 months ago

Until we have a workaround or fix the issue, yes. @xmonader let's discuss further to have a solution for this.