Sockets runtime: replace fatal errors with markPlaceDead.

x10-lang / x10

Core implementation of X10 programming language including compiler, runtime, class libraries, sample programs and test suite

Eclipse Public License 1.0

71 stars 15 forks source link

Sockets runtime: replace fatal errors with markPlaceDead. #21

Closed shamouda closed 7 years ago

shamouda commented 7 years ago

In x10rt_sockets.cc mark place dead when a read/write error is detected instead of throwing a fatal error

dgrove-oss commented 7 years ago

@bherta could you review this?

bherta commented 7 years ago

Is there an issue somewhere that describes why this change was needed? If I remember correctly, the fatal_error() method just prints out a message, it doesn't cause the program to exit. Marking of places as dead should already be happening in the buffer flushing step, so explicitly marking them as dead slightly earlier, as in this PR, may be of benefit, but I'm not sure about that. I do see that some missing pthread_unlock calls were added in x10rt_net_send_get & put, so that's good by itself.

shamouda commented 7 years ago

Sometimes when I kill places, the socket runtime hangs after printing fatal errors such as"sending GET x10rt_msg_pass.type" or "reading x10rt_msg_params.len". Marking a place dead where these messages were printed seemed to fix that hanging.

bherta commented 7 years ago

I can see that. The "sending GET" message was an example where the unlock was missing. And if there is an error partway through the read, as in "reading x10rt_msg_params.len", then there may be some data stuck in the read buffers. I could see it getting stuck re-reading partial data without the marking of the place as dead.

While I think there are more changes here than are necessary to fix the bug, I don't think any of this is harmful, so I'm fine if we want to merge this in.