nimbusproject / nimbus

Nimbus - Open Source Cloud Computing Software - 100% Apache2 licensed
http://www.nimbusproject.org/
197 stars 82 forks source link

Terminating instances cause some status queries to hang #102

Closed priteau closed 12 years ago

priteau commented 12 years ago

When terminating an instance, any global status query and any status query on the instance hangs.

This is caused by the WorkspaceHomeImpl:destroy method taking a per-instance lock for the duration of the whole termination. This includes a lengthy call to the workspace control agent. The WorkspaceHomeImpl:find method also tries to take this lock and hangs while the termination is in progress.

timf commented 12 years ago

WSRF destroy semantics originally prohibited that destroy from being asynchronous. Converting it to asynchronous (WSRF semantics are probably not important at this point) would probably be the best overall solution but most work, consider the client half of that, too. Perhaps make it still appear synchronous to the caller whilst releasing the lock? Or allow read-only lock free access for status queries?

priteau commented 12 years ago

Tim, thank you very much for taking the time to comment on this issue!

I've taken a simple approach where I release the lock during the call to workspace control. Would you mind reviewing it? It is commit dd41e55173542e654e8dfa92763e1975794d36b6.

timf commented 12 years ago

I think if you remove the lock, other actions could proceed during the destroy. I can't dig into the code but to protect against this there would at least need to be a per-instance lock around a state change to "destroying" (if there is such an intermediate state, I can't remember) and then launch the long running task. That way, other attempts to alter the instance would see it as destroying and nothing could happen. Something like that, sorry I can't research it.

priteau commented 12 years ago

I think that is what I am doing. I keep the lock for almost everything during the destroy, except what I think is calling workspace-control.

timf commented 12 years ago

Oh.. then that sounds OK.

priteau commented 12 years ago

The aforementioned approach had some issues, so I reworked it by introducing an extra lock dedicated to destroy.