trouble deleting containers

thewyzard44 commented 7 years ago

I'm cross posting this because I have been talking to myself for a week (on lxc/lxd) with no traction (although to be fair, It seems stgraber has been on semi-vacation for that same week)

has anyone had any troubles recently (on the feature branch) doing a simple delete_container through hyperkit? Or through the REST api more generically?

I have some LXD wrapper libraries around hyperkit that I've written, and all tests pass though CI (travis) using the DIR provider on travis's (cloud) instances. I'm now one step forward in my dev and pointing my software at my own (archaic) metal. (Not sure if that matters but it bears mentioning since my problem seems racey)

my code (optionally) expects a running container, and then does a force-stop followed by a delete, through hyperkit. I have been getting sporadic errors from lxd as per this issue. https://github.com/lxc/lxd/issues/4063. My code is a test-kitchen driver (for those familiar with that environment).

I've run variations of plain-stop or force-stop and with/without placing a Sleep 1 to 5 in between the stop and delete... no change in the sporadic nature of the errors described in the above link.

I have one question for Hyperkit... by default do you re-use the HTTP connection in subsequent calls? Digging through your code (and Faraday's), it appears that you don't, by default. I just wanted to confirm. That would help confirm the racey/sporadic nature of this error. I was hoping to force the use of a new connection for each call, to cause a minimal hesitation, but that appears to be already the case. Perhaps vice-versa may be worth a test?

For anyone else consuming hyperkit that does a delete_container, or that does so using the raw REST api... please let me know if you have issues using the latest-greatest of the full-stack

this being a more ruby-centric audience, I'll post more code if anyone is interested

thewyzard44 commented 6 years ago

found the problem... something in the stack is calling delete on the server a second time while the first one is still running... I'll be tracking that one down.

thewyzard44 commented 6 years ago

verified via irb (bypassing my library):

hk = Hyperkit::Client.new api_endpoint: 'https://wyzsrv:8443', verify_ssl: false, auto_sync: true
irb(main):007:0> hk.delete_container 'test', sync: false
Faraday::ConnectionFailed: An existing connection was forcibly closed by the remote host.
        from c:/opscode/chefdk/embedded/lib/ruby/2.4.0/openssl/buffering.rb:182:in `sysread_nonblock'
        from c:/opscode/chefdk/embedded/lib/ruby/2.4.0/openssl/buffering.rb:182:in `read_nonblock'
        from c:/opscode/chefdk/embedded/lib/ruby/2.4.0/net/protocol.rb:172:in `rbuf_fill'
        from c:/opscode/chefdk/embedded/lib/ruby/2.4.0/net/protocol.rb:154:in `readuntil'
        from c:/opscode/chefdk/embedded/lib/ruby/2.4.0/net/protocol.rb:164:in `readline'
        from c:/opscode/chefdk/embedded/lib/ruby/2.4.0/net/http/response.rb:40:in `read_status_line'
        from c:/opscode/chefdk/embedded/lib/ruby/2.4.0/net/http/response.rb:29:in `read_new'
        from c:/opscode/chefdk/embedded/lib/ruby/2.4.0/net/http.rb:1446:in `block in transport_request'
        from c:/opscode/chefdk/embedded/lib/ruby/2.4.0/net/http.rb:1443:in `catch'
        from c:/opscode/chefdk/embedded/lib/ruby/2.4.0/net/http.rb:1443:in `transport_request'
        from c:/opscode/chefdk/embedded/lib/ruby/2.4.0/net/http.rb:1416:in `request'
        from c:/opscode/chefdk/embedded/lib/ruby/2.4.0/net/http.rb:1409:in `block in request'
        from c:/opscode/chefdk/embedded/lib/ruby/2.4.0/net/http.rb:877:in `start'
        from c:/opscode/chefdk/embedded/lib/ruby/2.4.0/net/http.rb:1407:in `request'
        from c:/opscode/chefdk/embedded/lib/ruby/gems/2.4.0/gems/faraday-0.13.1/lib/faraday/adapter/net_http.rb:80:in `perform_request'
        from c:/opscode/chefdk/embedded/lib/ruby/gems/2.4.0/gems/faraday-0.13.1/lib/faraday/adapter/net_http.rb:38:in `block in call'
        from c:/opscode/chefdk/embedded/lib/ruby/gems/2.4.0/gems/faraday-0.13.1/lib/faraday/adapter/net_http.rb:85:in `with_net_http_connection'
        from c:/opscode/chefdk/embedded/lib/ruby/gems/2.4.0/gems/faraday-0.13.1/lib/faraday/adapter/net_http.rb:33:in `call'
        from c:/opscode/chefdk/embedded/lib/ruby/gems/2.4.0/gems/faraday-0.13.1/lib/faraday/response.rb:8:in `call'
        from C:/Users/Sean/AppData/Local/chefdk/gem/ruby/2.4.0/gems/hyperkit-1.1.0/lib/hyperkit/middleware/follow_redirects.rb:72:in `perform_with_redirection'
        from C:/Users/Sean/AppData/Local/chefdk/gem/ruby/2.4.0/gems/hyperkit-1.1.0/lib/hyperkit/middleware/follow_redirects.rb:60:in `call'
        from c:/opscode/chefdk/embedded/lib/ruby/gems/2.4.0/gems/faraday-0.13.1/lib/faraday/rack_builder.rb:141:in `build_response'
        from c:/opscode/chefdk/embedded/lib/ruby/gems/2.4.0/gems/faraday-0.13.1/lib/faraday/connection.rb:387:in `run_request'
        from c:/opscode/chefdk/embedded/lib/ruby/gems/2.4.0/gems/faraday-0.13.1/lib/faraday/connection.rb:137:in `delete'
        from c:/opscode/chefdk/embedded/lib/ruby/gems/2.4.0/gems/sawyer-0.8.1/lib/sawyer/agent.rb:94:in `call'
        from C:/Users/Sean/AppData/Local/chefdk/gem/ruby/2.4.0/gems/hyperkit-1.1.0/lib/hyperkit/connection.rb:139:in `request'
        from C:/Users/Sean/AppData/Local/chefdk/gem/ruby/2.4.0/gems/hyperkit-1.1.0/lib/hyperkit/connection.rb:74:in `delete'
        from C:/Users/Sean/AppData/Local/chefdk/gem/ruby/2.4.0/gems/hyperkit-1.1.0/lib/hyperkit/client/containers.rb:346:in `delete_container'
        from (irb):7
        from c:/opscode/chefdk/embedded/bin/irb.cmd:19:in `<main>'

that error happens when the first 'tap' on the delete endpoint gets abandoned (by faraday probably?), then the second tap happens; but from the server perspective the 2 delete commands are racing with each other... I'll keep digging, just wanted to log that somewhere

noteworthy: my client is on windows calling across the wire on my LAN... if this turns into a 'windows thing' down in net::http or some such, then I'll likely not worry about it if I can get a stable workaround

jeffshantz / hyperkit

trouble deleting containers #10