sociomantic-tsunami / dlsproto

Distributed Log Store protocol definition, client, fake node, and tests
Boost Software License 1.0
3 stars 18 forks source link

Timeout waiting for the Stopped acknowledgement after one minute #84

Closed nemanja-boric-sociomantic closed 6 years ago

nemanja-boric-sociomantic commented 6 years ago

GetRange may be constantly held in the position where the node doesn't acknowledge the sent Stop control message. To prevent locking, GetRange now waits for up to 60 seconds before proceeding, no matter what the node said.

nemanja-boric-sociomantic commented 6 years ago

Blocked on testing.

nemanja-boric-sociomantic commented 6 years ago

This is causing the segmentation fault: AFAICS - issuing a signal causes the context switch, that kills the entire request, and then the control is back to the Epoll context where the entire request object is destroyed). I need to somehow just to queue the signal and not to dispatch it immediately it seems.

nemanja-boric-sociomantic commented 6 years ago

Simple Yield does the trick!

nemanja-boric-sociomantic commented 6 years ago

Unblocked!

nemanja-boric-sociomantic commented 6 years ago
nemanjaboric@labs-129:/home/nemanjaboric/work/tsunami/dlsproto-1  git:(neotest*) $ build/last/bin/neotest getrange 

Connected. Let's Go...................................................................
Starting GetRange = 1...
RED.eventLoop nr 0
****** STOPPING **** 1
Called stop.
Setting stop message timeout to: 30

// after 30s
FIRING DELEGATE
Force stop!
Got TimeoutSignal
Yes it is TimeoutSignal, stopping the record stream
GetRange test stopped on all nodes.
nemanja-boric-sociomantic commented 6 years ago

Squashed everything.

Burgos commented 6 years ago

Updated and rebased.

gavin-norman-sociomantic commented 6 years ago

Feel free to auto-merge when green.

nemanja-boric-sociomantic commented 6 years ago

Thanks! Updated cachalot version to v3, should be fine now.