netcreateorg / netcreate-2018

Please report bugs, problems, ideas in the project Issues page: https://github.com/netcreateorg/netcreate-2018/issues
Other
11 stars 2 forks source link

Navigating away from an edge permanently locks it. #110

Closed jdanish closed 4 years ago

jdanish commented 4 years ago

Megan reports: Navigating away from the network page (e.g., refreshing) when you're in the middle of editing an edge permanently locks that edge for everyone, including yourself. The edge can be deleted and remade, but any attempts to edit it gives the "someone else is editing this edge" warning

I thought we had locked this down but apparently not / it broke?

benloh commented 4 years ago

@jdanish I can't reproduce this. I am seeing the 'beforeunload' checks. Can you please provide exact steps to reproduce?

benloh commented 4 years ago

@jdanish Have you been able to reproduce this?

jdanish commented 4 years ago

@benloh I cannot - I also see the appropriate checks. I have asked Megan for details and will update this if we are able to reproduce this or not ASAP.

On Aug 18, 2020, at 6:47 PM, benloh notifications@github.com wrote:

@jdanish https://github.com/jdanish Have you been able to reproduce this?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/netcreateorg/netcreate-2018/issues/110#issuecomment-675756592, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABKGCKTPY5NDPSPQ6K7XKDLSBMAG5ANCNFSM4P7XJYMA.

jdanish commented 4 years ago

I think I reproduced it @benloh . I was editing a node when the server connection dropped. When I reloaded the page, I could no-longer edit that node as it reported someone else was. Though then when I tried to reproduce it it wouldn't happen again. Either way, stopping and restarting the server fixed it. Thoughts?

benloh commented 4 years ago

Ah that makes mores sense now.

Hmm...this is a difficult one. If the connection drops, the record will remain locked. When you reconnect, the client no longer knows it has to request an unlock. And the server doesn't know what happened to the client who disappeared, nor does it know that the client is finished with the lock. The server does clear all locks between restarts though, but that's an extreme solution that would make others lose their connection.

A few solutions come to mind:

  1. I suppose we could keep a list server-side of who is requesting locks and if it sees the same token trying to access a locked record, it'll allow it. Though this is guaranteed to mess people up if they use the same token.

  2. Another approach would be keep track of both open connections and any records locked by the open connection and automatically unlock them if the connection drops. The problem is the server does not detect the connection has dropped until the client tries to reconnect. So this isn't helpful either. The record would remain locked until the client comes back online. And if they don't, the record would never get unlocked.

  3. Or we could put a timeout on locks, so they automatically expire after a certain period of time. No record should be locked permanently anyway, so this potentially can catch other errors we aren't aware of. But this will be fragile. e.g. if the server is stopped before the timeout, the record will remain locked. Plus there's the question of how timely it should be. e.g. if it's a 2 minute timeout, you'll still be locked out for 2 min after the reconnect.

  4. We could implement it like the server connection heartbeat, e.g. as you're editing the db, the client is sending a heartbeat every 5 seconds saying I'm still editing, and if the server does not receive the heartbeat, it'll automatically unlock. It's doable, but a fair amount of work to solve a very particular problem.

  5. A more generalized solution would be to implement a client heartbeat for the server. The current heartbeat implementation is a server-generated heartbeat -- if the client doesn't receive the heartbeat, it considers the connection dropped. We'd need to implement the other side: if the server doesn't receive a heartbeat from the client, it considers the connection dropped. This potentially lets us catch other issues. We'd still have to implement a tracker on the sever side to know which records to unlock.


Popping up a level, this is more of an edge case related to making the overall system more robust rather than a bug per se. It's not a mistake in code so much as an expansion of the definition of what the app should be able to withstand. There are probably many issues along these lines as we have never really spent the proper amount of time hardening the app to deal with security and network issues. In other words, this pushes us towards a true production ready app rather than a proof of concept prototype.

Normally I'd argue that this is outside the scope of work and given our current budget situation (less than 0), we shouldn't even think about taking this on, but considering this is an investment in the core infrastructure and will probably be critical later on with other similar issues, and that this is all freshly minted and therefore easier and faster to change, we'll go ahead and do it.

jdanish commented 4 years ago

I agree this is a feature enhancement and a tricky one.

If you can do one of those solutions and want to, I won’t argue! I appreciate the added effort.

If things are too crunched, though, definitely feel free to leave it as a “restart the server” solution.

Joshua

On Aug 20, 2020, at 2:13 PM, benloh notifications@github.com wrote:

Ah that makes mores sense now.

Hmm...this is a difficult one. If the connection drops, the record will remain locked. When you reconnect, the client no longer knows it has to request an unlock. And the server doesn't know what happened to the client who disappeared, nor does it know that the client is finished with the lock. The server does clear all locks between restarts though, but that's an extreme solution that would make others lose their connection.

A few solutions come to mind:

I suppose we could keep a list server-side of who is requesting locks and if it sees the same token trying to access a locked record, it'll allow it. Though this is guaranteed to mess people up if they use the same token.

Another approach would be keep track of both open connections and any records locked by the open connection and automatically unlock them if the connection drops. The problem is the server does not detect the connection has dropped until the client tries to reconnect. So this isn't helpful either. The record would remain locked until the client comes back online. And if they don't, the record would never get unlocked.

Or we could put a timeout on locks, so they automatically expire after a certain period of time. No record should be locked permanently anyway, so this potentially can catch other errors we aren't aware of. But this will be fragile. e.g. if the server is stopped before the timeout, the record will remain locked. Plus there's the question of how timely it should be. e.g. if it's a 2 minute timeout, you'll still be locked out for 2 min after the reconnect.

We could implement it like the server connection heartbeat, e.g. as you're editing the db, the client is sending a heartbeat every 5 seconds saying I'm still editing, and if the server does not receive the heartbeat, it'll automatically unlock. It's doable, but a fair amount of work to solve a very particular problem.

A more generalized solution would be to implement a client heartbeat for the server. The current heartbeat implementation is a server-generated heartbeat -- if the client doesn't receive the heartbeat, it considers the connection dropped. We'd need to implement the other side: if the server doesn't receive a heartbeat from the client, it considers the connection dropped. This potentially lets us catch other issues. We'd still have to implement a tracker on the sever side to know which records to unlock.

Popping up a level, this is more of an edge case related to making the overall system more robust rather than a bug per se. It's not a mistake in code so much as an expansion of the definition of what the app should be able to withstand. There are probably many issues along these lines as we have never really spent the proper amount of time hardening the app to deal with security and network issues. In other words, this pushes us towards a true production ready app rather than a proof of concept prototype.

Normally I'd argue that this is outside the scope of work and given our current budget situation (less than 0), we shouldn't even think about taking this on, but considering this is an investment in the core infrastructure and will probably be critical later on with other similar issues, and that this is all freshly minted and therefore easier and faster to change, we'll go ahead and do it.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/netcreateorg/netcreate-2018/issues/110#issuecomment-677820082, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABKGCKSKSLJSDX5PWDE5IM3SBVRWPANCNFSM4P7XJYMA.

benloh commented 4 years ago

Addressed by #126.