transientskp / tkp

A transients-discovery pipeline for astronomical image-based surveys
http://docs.transientskp.org/
BSD 2-Clause "Simplified" License
19 stars 14 forks source link

Streaming TraP #557

Open mkuiack opened 5 years ago

mkuiack commented 5 years ago

I'm testing streaming trap 4.0 with two imagers streaming to localhost:9000 and localhost:9001.

Once the imagers establish the connection, TraP first times out while trying to read data. This is expected because it takes a little while for the data stream to start, after the connections are all set up between correlator, calibrators, and imager. But once the imagers start sending data TraP prints the error error reading data: 2314926402535508307!=5136718571548659023 then TraP kills the imagers. Any ideas what this error means?

Imager log:

I1029 11:41:13.097040  3723 pipeline.h:154] 1540813270.70 58300780.0 1512146.0 1111111111111111 lat 2.4 s - 369 ms [0 30 0]
I1029 11:41:14.113059  3723 pipeline.h:154] 1540813271.71 58300780.0 1513671.9 1111111111111111 lat 2.4 s - 362 ms [0 30 0]
terminate called after throwing an instance of 'boost::exception_detail::clone_impl<boost::exception_detail::error_info_injector<boost::system::system_error> >'
  what():  remote_endpoint: Transport endpoint is not connected
*** Aborted at 1540813274 (unix time) try "date -d @1540813274" if you are using GNU date ***
PC: @     0x7f8e9b1e0428 gsignal
*** SIGABRT (@0x277400000e87) received by PID 3719 (TID 0x7f8e8e2d5700) from PID 3719; stack trace: ***
    @     0x7f8e9cd24390 (unknown)
    @     0x7f8e9b1e0428 gsignal
    @     0x7f8e9b1e202a abort
    @     0x7f8e9b82284d __gnu_cxx::__verbose_terminate_handler()
    @     0x7f8e9b8206b6 (unknown)
    @     0x7f8e9b820701 std::terminate()
    @     0x7f8e9b84bd38 (unknown)
    @     0x7f8e9cd1a6ba start_thread
    @     0x7f8e9b2b241d clone
    @                0x0 (unknown)

I also saved the images to disk with the imager and verified they're all good.

mark

trap.debug.log showing attempted connections, connections, and error:

2018-10-29 11:40:24 INFO tkp.stream connector() process port_9001_proc (3394) thread MainThread (139785049945856) : connecting to localhost:9001
2018-10-29 11:40:24 ERROR tkp.stream connector() process port_9001_proc (3394) thread MainThread (139785049945856) : cant connect to localhost:9001: [Errno 111] Connection refused
2018-10-29 11:40:24 ERROR tkp.stream connector() process port_9000_proc (3391) thread MainThread (139785049945856) : cant connect to localhost:9000: [Errno 111] Connection refused
2018-10-29 11:40:24 INFO tkp.stream connector() process port_9001_proc (3394) thread MainThread (139785049945856) : will try reconnecting in 5 seconds
2018-10-29 11:40:24 INFO tkp.stream connector() process port_9000_proc (3391) thread MainThread (139785049945856) : will try reconnecting in 5 seconds
2018-10-29 11:40:29 INFO tkp.stream connector() process port_9001_proc (3394) thread MainThread (139785049945856) : connecting to localhost:9001
2018-10-29 11:40:29 INFO tkp.stream connector() process port_9000_proc (3391) thread MainThread (139785049945856) : connecting to localhost:9000
2018-10-29 11:40:29 ERROR tkp.stream connector() process port_9001_proc (3394) thread MainThread (139785049945856) : cant connect to localhost:9001: [Errno 111] Connection refused
2018-10-29 11:40:29 ERROR tkp.stream connector() process port_9000_proc (3391) thread MainThread (139785049945856) : cant connect to localhost:9000: [Errno 111] Connection refused
2018-10-29 11:40:29 INFO tkp.stream connector() process port_9001_proc (3394) thread MainThread (139785049945856) : will try reconnecting in 5 seconds
2018-10-29 11:40:29 INFO tkp.stream connector() process port_9000_proc (3391) thread MainThread (139785049945856) : will try reconnecting in 5 seconds
2018-10-29 11:40:34 INFO tkp.stream connector() process port_9001_proc (3394) thread MainThread (139785049945856) : connecting to localhost:9001
2018-10-29 11:40:34 INFO tkp.stream connector() process port_9000_proc (3391) thread MainThread (139785049945856) : connecting to localhost:9000
2018-10-29 11:40:34 INFO tkp.stream connector() process port_9001_proc (3394) thread MainThread (139785049945856) : connected to localhost:9001
2018-10-29 11:40:34 INFO tkp.stream connector() process port_9000_proc (3391) thread MainThread (139785049945856) : connected to localhost:9000
2018-10-29 11:40:39 ERROR tkp.stream connection_handler() process port_9001_proc (3394) thread MainThread (139785049945856) : error reading data: timed out
2018-10-29 11:40:39 ERROR tkp.stream connection_handler() process port_9000_proc (3391) thread MainThread (139785049945856) : error reading data: timed out
2018-10-29 11:40:39 INFO tkp.stream connection_handler() process port_9000_proc (3391) thread MainThread (139785049945856) : sleeping for 5 seconds
2018-10-29 11:40:39 INFO tkp.stream connection_handler() process port_9001_proc (3394) thread MainThread (139785049945856) : sleeping for 5 seconds
2018-10-29 11:40:44 INFO tkp.stream connector() process port_9000_proc (3391) thread MainThread (139785049945856) : connecting to localhost:9000
2018-10-29 11:40:44 INFO tkp.stream connector() process port_9001_proc (3394) thread MainThread (139785049945856) : connecting to localhost:9001
2018-10-29 11:40:44 INFO tkp.stream connector() process port_9001_proc (3394) thread MainThread (139785049945856) : connected to localhost:9001
2018-10-29 11:40:44 INFO tkp.stream connector() process port_9000_proc (3391) thread MainThread (139785049945856) : connected to localhost:9000
2018-10-29 11:40:49 ERROR tkp.stream connection_handler() process port_9001_proc (3394) thread MainThread (139785049945856) : error reading data: timed out
2018-10-29 11:40:49 ERROR tkp.stream connection_handler() process port_9000_proc (3391) thread MainThread (139785049945856) : error reading data: timed out
2018-10-29 11:40:49 INFO tkp.stream connection_handler() process port_9001_proc (3394) thread MainThread (139785049945856) : sleeping for 5 seconds
2018-10-29 11:40:49 INFO tkp.stream connection_handler() process port_9000_proc (3391) thread MainThread (139785049945856) : sleeping for 5 seconds
2018-10-29 11:40:54 INFO tkp.stream connector() process port_9001_proc (3394) thread MainThread (139785049945856) : connecting to localhost:9001
2018-10-29 11:40:54 INFO tkp.stream connector() process port_9000_proc (3391) thread MainThread (139785049945856) : connecting to localhost:9000
2018-10-29 11:40:54 INFO tkp.stream connector() process port_9001_proc (3394) thread MainThread (139785049945856) : connected to localhost:9001
2018-10-29 11:40:54 INFO tkp.stream connector() process port_9000_proc (3391) thread MainThread (139785049945856) : connected to localhost:9000
2018-10-29 11:40:59 ERROR tkp.stream connection_handler() process port_9001_proc (3394) thread MainThread (139785049945856) : error reading data: timed out
2018-10-29 11:40:59 ERROR tkp.stream connection_handler() process port_9000_proc (3391) thread MainThread (139785049945856) : error reading data: timed out
2018-10-29 11:40:59 INFO tkp.stream connection_handler() process port_9001_proc (3394) thread MainThread (139785049945856) : sleeping for 5 seconds
2018-10-29 11:40:59 INFO tkp.stream connection_handler() process port_9000_proc (3391) thread MainThread (139785049945856) : sleeping for 5 seconds
2018-10-29 11:41:04 INFO tkp.stream connector() process port_9001_proc (3394) thread MainThread (139785049945856) : connecting to localhost:9001
2018-10-29 11:41:04 INFO tkp.stream connector() process port_9000_proc (3391) thread MainThread (139785049945856) : connecting to localhost:9000
2018-10-29 11:41:04 INFO tkp.stream connector() process port_9001_proc (3394) thread MainThread (139785049945856) : connected to localhost:9001
2018-10-29 11:41:04 INFO tkp.stream connector() process port_9000_proc (3391) thread MainThread (139785049945856) : connected to localhost:9000
2018-10-29 11:41:06 ERROR tkp.stream connection_handler() process port_9001_proc (3394) thread MainThread (139785049945856) : error reading data: 2314926402535508307!=5136718571548659023
2018-10-29 11:41:06 INFO tkp.stream connection_handler() process port_9001_proc (3394) thread MainThread (139785049945856) : sleeping for 5 seconds
2018-10-29 11:41:09 ERROR tkp.stream connection_handler() process port_9000_proc (3391) thread MainThread (139785049945856) : error reading data: timed out
2018-10-29 11:41:09 INFO tkp.stream connection_handler() process port_9000_proc (3391) thread MainThread (139785049945856) : sleeping for 5 seconds
2018-10-29 11:41:11 INFO tkp.stream connector() process port_9001_proc (3394) thread MainThread (139785049945856) : connecting to localhost:9001
2018-10-29 11:41:11 INFO tkp.stream connector() process port_9001_proc (3394) thread MainThread (139785049945856) : connected to localhost:9001
2018-10-29 11:41:11 ERROR tkp.stream connection_handler() process port_9001_proc (3394) thread MainThread (139785049945856) : error reading data: [Errno 104] Connection reset by peer
2018-10-29 11:41:11 INFO tkp.stream connection_handler() process port_9001_proc (3394) thread MainThread (139785049945856) : sleeping for 5 seconds
2018-10-29 11:41:14 INFO tkp.stream connector() process port_9000_proc (3391) thread MainThread (139785049945856) : connecting to localhost:9000
2018-10-29 11:41:14 INFO tkp.stream connector() process port_9000_proc (3391) thread MainThread (139785049945856) : connected to localhost:9000
2018-10-29 11:41:15 ERROR tkp.stream connection_handler() process port_9000_proc (3391) thread MainThread (139785049945856) : error reading data: [Errno 104] Connection reset by peer
2018-10-29 11:41:15 INFO tkp.stream connection_handler() process port_9000_proc (3391) thread MainThread (139785049945856) : sleeping for 5 seconds
2018-10-29 11:41:16 INFO tkp.stream connector() process port_9001_proc (3394) thread MainThread (139785049945856) : connecting to localhost:9001
2018-10-29 11:41:16 ERROR tkp.stream connector() process port_9001_proc (3394) thread MainThread (139785049945856) : cant connect to localhost:9001: [Errno 111] Connection refused
2018-10-29 11:41:16 INFO tkp.stream connector() process port_9001_proc (3394) thread MainThread (139785049945856) : will try reconnecting in 5 seconds
2018-10-29 11:41:20 INFO tkp.stream connector() process port_9000_proc (3391) thread MainThread (139785049945856) : connecting to localhost:9000
2018-10-29 11:41:20 ERROR tkp.stream connector() process port_9000_proc (3391) thread MainThread (139785049945856) : cant connect to localhost:9000: [Errno 111] Connection refused
2018-10-29 11:41:20 INFO tkp.stream connector() process port_9000_proc (3391) thread MainThread (139785049945856) : will try reconnecting in 5 seconds
mkuiack commented 5 years ago

Thanks to @gijzelaerr for pointing out this is likely due to a difference the image data protocol expected by trap, and provided by the imager.

https://github.com/transientskp/tkp/blob/137f3e8f38549e9c8eae0735c44c98b639e1e3be/tkp/stream.py#L81

AntoniaR commented 4 years ago

I'm going to set this as a provisional target for release 6.0

AntoniaR commented 4 weeks ago

This is related to the streaming goals for R7, so I move it to that milestone for now. We need to assess if the AARTFAAC streaming would be the same for the LOFAR2.0 streaming and if any of the existing AARTFAAC streaming code is useful.