Closed mhagen-vmware closed 6 years ago
In the working scenario:
Apr 13 2018 00:28:58.319Z DEBUG Found 2 subscribers to 121990: vsphere.VMEvent - worker-993658db8dff on 10.161.149.181 in dc1 is powered on
Apr 13 2018 00:29:03.104Z DEBUG op=347.349: parsing alias :db
Apr 13 2018 00:29:03.104Z DEBUG adding alias votingapp_default:db -> db
Apr 13 2018 00:29:03.104Z DEBUG adding alias votingapp_default:e569ad8957a6 -> db
Apr 13 2018 00:29:05.557Z DEBUG Found 2 subscribers to 122020: vsphere.VMEvent - redis-1bddb2c62ad9 on 10.161.149.181 in dc1 is powered on
Apr 13 2018 00:29:06.311Z DEBUG Found 2 subscribers to 122022: vsphere.VMEvent - db-e569ad8957a6 on 10.161.149.181 in dc1 is powered on
NtvYBl7XJBWAAw==349:M 13 Apr 00:29:32.797 * Ready to accept connections
Apr 13 2018 00:29:32.927Z DEBUG [ END ] [vic/lib/tether.(*tether).launch:748] [257.358335ms] launching session e569ad8957a68e5a320717cc11be674fc48b67048dbc3c70e8ee9da0f4ab7230
Apr 13 2018 00:29:33.353Z DEBUG [0xc4202d06c0] write "The files belonging to this database system will be owned by user \"postgres\".\nThis user must also own the server process.\n\n" to 2 writers (err: <nil>)
Apr 13 2018 00:29:33.375Z DEBUG [0xc4202d06c0] write "The database cluster will be initialized with locale \"en_US.utf8\".\nThe default database encoding has accordingly been set to \"UTF8\".\nThe default text search configuration will be set to \"english\".\n\nData page checksums are disabled.\n\nfixing permissions on existing directory /var/lib/postgresql/data/data ... " to 2 writers (err: <nil>)
Here I interleaved portlayer, container output and tether. The key event being that the db alias was setup before tether attempted to resolve the DNS for it.
In the failing scenario, tether throws an exception because the query happens before alias is setup:
Apr 13 2018 01:09:50.460Z DEBUG Found 2 subscribers to 156418: vsphere.VMEvent - worker-127d5827b5ce on 10.160.227.24 in dc1 is powered on
Apr 13 2018 01:10:15.091Z DEBUG [ END ] [vic/lib/tether.(*tether).launch:748] [560.275515ms] launching session 127d5827b5ceed157d8976cddb1ebb4f273a37d28ec7565404e08f69a4f71070
System.AggregateException: One or more errors occurred. (No such device or address) ---> System.Net.Internals.InternalSocketException: No such device or address
at System.Net.Dns.HostResolutionEndHelper(IAsyncResult asyncResult)
at System.Net.Dns.EndApr 13 2018 01:10:17.225Z DEBUG [0xc42027f9b0] write "System.AggregateException: One or more errors occurred. (No such device or address) ---> System.Net.Internals.InternalSocketException: No such device or address\n at System.Net.Dns.HostResolutionEndHelper(IAsyncResult asyncResult)\n at System.Net.Dns.End" to 2 writers (err: <nil>)
GetHostAddresses(IAsyncResult asyncResult)
at System.Threading.Tasks.TaskFactory`1.FromAsyncCoreLogic(IAsyncResult iar, Func`2 endFunction, Action`1 endAction, Task`1 promise, Boolean requiresSynchronization)
--- End of inner exception stack trace ---
at System.Threading.Tasks.Task.ThrowIfExceptional(Boolean includeTaskCanceledExceptions)
at System.Threading.Tasks.Task`1.GetResultCore(Boolean waitCompletionNotification)
at Npgsql.NpgsqlConnector.Connect(NpgsqlTimeout timeout)
at Npgsql.NpgsqlConnector.RawOpen(NpgsqlTimeout timeout)
at Npgsql.NpgsqlConnector.Open(NpgsqlTimeout timeout)
at Npgsql.ConnectorPool.Allocate(NpgsqlConnection conn, NpgsqlTimeout timeout)
at Npgsql.NpgsqlConnection.OpenInternal()
at Worker.Program.OpenDbConnection(String connectionString)
at Worker.Program.Main(String[] args)
---> (Inner Exception #0) System.Net.Internals.InternalSocketException: No such device or address
at System.Net.Dns.HostResolutionEndHelper(IAsyncResult asyncResult)
at System.Net.Dns.EndGetHostAddresses(IAsyncResult asyncResult)
at System.Threading.Tasks.TaskFactory`1.FromAsyncCoreLogic(IAsyncResult iar, Func`2 endFunction, Action`1 endAction, Task`1 promise, Boolean requiresSynchronization)<---
Apr 13 2018 01:10:21.685Z DEBUG op=352.354: parsing alias :db
Apr 13 2018 01:10:21.685Z DEBUG adding alias votingapp_default:db -> db
Apr 13 2018 01:10:21.685Z DEBUG adding alias votingapp_default:c983d92a0bf8 -> db
Apr 13 2018 01:10:24.924Z DEBUG Found 2 subscribers to 156472: vsphere.VMEvent - redis-c683b9dab4db on 10.160.227.24 in dc1 is powered on
Apr 13 2018 01:10:24.924Z DEBUG Found 2 subscribers to 156471: vsphere.VMEvent - db-c983d92a0bf8 on 10.160.227.24 in dc1 is powered on
J6tmNJ3ZJBWAAw==358:M 13 Apr 01:10:43.170 * Ready to accept connections
In the passing scenario the alias was setup within 4 seconds of the container powering up, in the failing scenario the alias took over 31 seconds to be setup after powering on the container.
Normally the output should be:
When the test fails it ends up with internal worker errors like this:
Which causes the container to exit immediately, so that when we check if it is running later then the test fails because the container has exited.
This has been working in HaaS but fails in Nimbus now. Here is the last failure, but there have been several of these failures before: https://ci-vic.vmware.com/vmware/vic/18318/7
Test-Cases.Group3-Docker-Compose.3-02-Docker-Compose-Voting-App-VCH-18318-7787-container-logs.zip