microsoft / oe-engine

ACC template generation engine
MIT License
11 stars 14 forks source link

attested_tls test hangs during validate.sh #60

Open johnkord opened 5 years ago

johnkord commented 5 years ago

@oprinmarius reported this to me yesterday, and so I gathered some information from a repro environment.

About 30 minutes ago I deployed a new VM using oe-engine based on master (and the latest intel drivers), and I'm encountering the attested_tls test hanging.

azureuser@acc-ub1804:~$ ps -aef | grep tls
root      10299  10260  0 20:37 ?        00:00:00 sh -c echo Running ./attested_tls; cd ./attested_tls && make && make run
root      10393      1  0 20:37 ?        00:00:00 ./server/host/tls_server_host ./server/enc/tls_server_enc.signed -port:12341
root      10396  10390  0 20:38 ?        00:00:00 ./client/host/tls_client_host ./client/enc/tls_client_enclave.signed -server:localhost -port:12341
azureus+  12378  12356  0 21:09 pts/0    00:00:00 grep --color=auto tls
azureuser@acc-ub1804:~$ date
Thu Aug 22 21:10:01 UTC 2019

This issue happens every time we deploy a node using oe-engine and is causing deployment failures in our Jenkins E2E tests: https://oe-jenkins.eastus.cloudapp.azure.com/job/oe-acc-ubuntu-16.04-eastus/ https://oe-jenkins.eastus.cloudapp.azure.com/job/oe-acc-ubuntu-18.04-eastus/

PR #58 disables the attested_tls test from being run as part of validate.sh, and should mask this problem. Even though it masks this problem, we should disable this test anyway because we don't run the remote_attestation test (and the attested_tls test relies on remote attestation to work).

In any case, I wanted to file this to draw attention to this issue because it probably deserves a bit of investigation as to why this test is hanging during validate.sh. It may point to an infrastructure problem with the VM's network during boot or possibly some unknown test issue.

Tagging @soccerGB @shruti25ratnam @jazzybluesea for their info!

(Note that this issue doesn't seem to happen on VMs that have existed for awhile. I don't have any problem with my attested_tls test on my dev VM, only during the runtime of validate.sh during deployment)