Closed jack-h closed 1 year ago
I just tried this on ibverb-support
on my machine and I see:
Instatiating class with 16 UDT instances
Run 0
Total time: 0.267902
Total time instantiating UDTs: 0.267809
Time for UDT 0: 0.129802
Time for UDT 1: 0.009711
Time for UDT 2: 0.009018
Time for UDT 3: 0.009418
Time for UDT 4: 0.009021
Time for UDT 5: 0.009435
Time for UDT 6: 0.008696
Time for UDT 7: 0.009337
Time for UDT 8: 0.009065
Time for UDT 9: 0.009408
Time for UDT 10: 0.008790
Time for UDT 11: 0.009536
Time for UDT 12: 0.008848
Time for UDT 13: 0.009591
Time for UDT 14: 0.008955
Time for UDT 15: 0.009178
Run 1
Total time: 0.079599
Total time instantiating UDTs: 0.079002
Time for UDT 0: 0.024792
Time for UDT 1: 0.003384
Time for UDT 2: 0.003409
Time for UDT 3: 0.003973
Time for UDT 4: 0.003538
Time for UDT 5: 0.003499
Time for UDT 6: 0.003788
Time for UDT 7: 0.003637
Time for UDT 8: 0.003673
Time for UDT 9: 0.003894
Time for UDT 10: 0.003575
Time for UDT 11: 0.003361
Time for UDT 12: 0.003744
Time for UDT 13: 0.003481
Time for UDT 14: 0.003546
Time for UDT 15: 0.003709
If I try your fork I get:
Instatiating class with 16 UDT instances
Run 0
Total time: 0.186030
Total time instantiating UDTs: 0.185925
Time for UDT 0: 0.051361
Time for UDT 1: 0.009504
Time for UDT 2: 0.008832
Time for UDT 3: 0.009131
Time for UDT 4: 0.008820
Time for UDT 5: 0.009109
Time for UDT 6: 0.009128
Time for UDT 7: 0.009078
Time for UDT 8: 0.008773
Time for UDT 9: 0.009378
Time for UDT 10: 0.008594
Time for UDT 11: 0.009174
Time for UDT 12: 0.008662
Time for UDT 13: 0.009045
Time for UDT 14: 0.008568
Time for UDT 15: 0.008769
Run 1
Total time: 0.377750
Total time instantiating UDTs: 0.095538
Time for UDT 0: 0.025517
Time for UDT 1: 0.004659
Time for UDT 2: 0.004635
Time for UDT 3: 0.004750
Time for UDT 4: 0.004712
Time for UDT 5: 0.004597
Time for UDT 6: 0.004830
Time for UDT 7: 0.004692
Time for UDT 8: 0.004628
Time for UDT 9: 0.004654
Time for UDT 10: 0.004840
Time for UDT 11: 0.004716
Time for UDT 12: 0.004529
Time for UDT 13: 0.004838
Time for UDT 14: 0.004477
Time for UDT 15: 0.004464
which is similar to what you are seeing.
If I distill this down into a something easier to digest: | Reported | ibverb-support |
rtr fork | |
---|---|---|---|---|
Run 0 - Total | 0.643402 s | 0.267902 s | 0.186030 s | |
Run 0 - Init | 0.643228 s | 0.267809 s | 0.185925 s | |
Total/Init | 1 | 1 | 1 | |
Run 1 - Total | 1.411572 s | 0.079599 s | 0.377750 s | |
Run 1 - Init | 0.128019 s | 0.079002 s | 0.095538 s | |
Total/Init | 11 | 1 | 4 |
That makes me think that we fixed this problem (whatever its root really was) in ibverb-support
. I would suggest updating your fork but then you would have to contend with the new build system. It might be easier to cherry pick changes from src
and see where that gets you.
Oh, excellent. I'll have a poke around!
On Fri, 2 Dec 2022, 18:43 jaycedowell, @.***> wrote:
I just tried this on ibverb-support on my machine and I see:
Instatiating class with 16 UDT instances Run 0
Total time: 0.267902 Total time instantiating UDTs: 0.267809 Time for UDT 0: 0.129802 Time for UDT 1: 0.009711 Time for UDT 2: 0.009018 Time for UDT 3: 0.009418 Time for UDT 4: 0.009021 Time for UDT 5: 0.009435 Time for UDT 6: 0.008696 Time for UDT 7: 0.009337 Time for UDT 8: 0.009065 Time for UDT 9: 0.009408 Time for UDT 10: 0.008790 Time for UDT 11: 0.009536 Time for UDT 12: 0.008848 Time for UDT 13: 0.009591 Time for UDT 14: 0.008955 Time for UDT 15: 0.009178
Run 1
Total time: 0.079599 Total time instantiating UDTs: 0.079002 Time for UDT 0: 0.024792 Time for UDT 1: 0.003384 Time for UDT 2: 0.003409 Time for UDT 3: 0.003973 Time for UDT 4: 0.003538 Time for UDT 5: 0.003499 Time for UDT 6: 0.003788 Time for UDT 7: 0.003637 Time for UDT 8: 0.003673 Time for UDT 9: 0.003894 Time for UDT 10: 0.003575 Time for UDT 11: 0.003361 Time for UDT 12: 0.003744 Time for UDT 13: 0.003481 Time for UDT 14: 0.003546 Time for UDT 15: 0.003709
If I try your fork I get:
Instatiating class with 16 UDT instances Run 0
Total time: 0.186030 Total time instantiating UDTs: 0.185925 Time for UDT 0: 0.051361 Time for UDT 1: 0.009504 Time for UDT 2: 0.008832 Time for UDT 3: 0.009131 Time for UDT 4: 0.008820 Time for UDT 5: 0.009109 Time for UDT 6: 0.009128 Time for UDT 7: 0.009078 Time for UDT 8: 0.008773 Time for UDT 9: 0.009378 Time for UDT 10: 0.008594 Time for UDT 11: 0.009174 Time for UDT 12: 0.008662 Time for UDT 13: 0.009045 Time for UDT 14: 0.008568 Time for UDT 15: 0.008769
Run 1
Total time: 0.377750 Total time instantiating UDTs: 0.095538 Time for UDT 0: 0.025517 Time for UDT 1: 0.004659 Time for UDT 2: 0.004635 Time for UDT 3: 0.004750 Time for UDT 4: 0.004712 Time for UDT 5: 0.004597 Time for UDT 6: 0.004830 Time for UDT 7: 0.004692 Time for UDT 8: 0.004628 Time for UDT 9: 0.004654 Time for UDT 10: 0.004840 Time for UDT 11: 0.004716 Time for UDT 12: 0.004529 Time for UDT 13: 0.004838 Time for UDT 14: 0.004477 Time for UDT 15: 0.004464
which is similar to what you are seeing.
If I distill this down into a something easier to digest: Reported ibverb-support rtr fork Run 0 - Total 0.643402 s 0.267902 s 0.186030 s Run 0 - Init 0.643228 s 0.267809 s 0.185925 s Total/Init 1 1 1 Run 1 - Total 1.411572 s 0.079599 s 0.377750 s Run 1 - Init 0.128019 s 0.079002 s 0.095538 s Total/Init 11 1 4
That makes me think that we fixed this problem (whatever its root really was) in ibverb-support. I would suggest updating your fork but then you would have to contend with the new build system. It might be easier to cherry pick changes from src and see where that gets you.
— Reply to this email directly, view it on GitHub https://github.com/ledatelescope/bifrost/issues/191#issuecomment-1335670236, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAKWM7BAM2Z2OT44M64H4WDWLI7NPANCNFSM6AAAAAASSC3NPI . You are receiving this because you authored the thread.Message ID: @.***>
I see the same improvement on my machine when switching branches, so I'll attempt to either merge or cherry-pick as you suggest.
As an aside, with the ibverb-support
branch the above script doesn't exit on my office machine. It hangs, and then Ctrl-C results in
^CError in atexit._run_exitfuncs:
Traceback (most recent call last):
File "/home/jackh/py3-venv/lib/python3.6/site-packages/bifrost/telemetry/__init__.py", line 158, in send
timeout=TELEMETRY_TIMEOUT)
File "/usr/lib/python3.6/urllib/request.py", line 223, in urlopen
return opener.open(url, data, timeout)
File "/usr/lib/python3.6/urllib/request.py", line 526, in open
response = self._open(req, data)
File "/usr/lib/python3.6/urllib/request.py", line 544, in _open
'_open', req)
File "/usr/lib/python3.6/urllib/request.py", line 504, in _call_chain
result = func(*args)
File "/usr/lib/python3.6/urllib/request.py", line 1368, in https_open
context=self._context, check_hostname=self._check_hostname)
File "/usr/lib/python3.6/urllib/request.py", line 1325, in do_open
encode_chunked=req.has_header('Transfer-encoding'))
File "/usr/lib/python3.6/http/client.py", line 1285, in request
self._send_request(method, url, body, headers, encode_chunked)
File "/usr/lib/python3.6/http/client.py", line 1331, in _send_request
self.endheaders(body, encode_chunked=encode_chunked)
File "/usr/lib/python3.6/http/client.py", line 1280, in endheaders
self._send_output(message_body, encode_chunked=encode_chunked)
File "/usr/lib/python3.6/http/client.py", line 1046, in _send_output
self.send(msg)
File "/usr/lib/python3.6/http/client.py", line 984, in send
self.connect()
File "/usr/lib/python3.6/http/client.py", line 1438, in connect
super().connect()
File "/usr/lib/python3.6/http/client.py", line 956, in connect
(self.host,self.port), self.timeout, self.source_address)
File "/usr/lib/python3.6/socket.py", line 713, in create_connection
sock.connect(sa)
Not sure what exactly the telemetry is trying to do, but my machine has working internet.
The telemetry endpoint is down. It should have exited on its own after a couple of minutes of trying.
Ah OK. I gave it a reasonable number of seconds, but not a reasonable number of minutes
What I wanted to do: Be able to dynamically change the destination of UDP output products from a bifrost pipeline.
What I implemented: There seems to be no way to change destination after a
UDPTransmit
instance has been created, so I destroy instances and create new ones as commanded by an asynchronous (in the sense that it is driven by callback functions from an etcd client which is watching for key changes) control mechanism.What I expected to happen: Making new
UDPTransmit
s to be quick, and not have any adverse effects except adding some overhead in the bifrost output block responsible for using these instances.What happened: On creating and destroying 16
UDPTransmit
instances (the system outputs 16 beam streams to different destinations) I experienced catastophic packet loss from the UDP capture blocks at the start of the pipeline. This seemed to be the case even when the block within which theUDPTransmit
instances were being used was not even attached to the upstream pipeline buffer rings. I.e. [my interpretation of] the issue is that manipulating theUDPTransmit
s is locking up bifrost, not simply the bifrost block using them is now too slow.Trying to recreate the issue:
Running this with
python my_udt_test.py 16
, yields (emphasis added by me):In my loaded pipeline, I see calls to replace one
UDPTransmit
with another take 100-200ms. Replacing 16 takes in excess of 2 seconds. So far I've worked around by issue (kind of) by accepting that bifrost will take a hit with eachUDPTransmit
change, and staggering them with sleeps so as to allow the pipeline to recover. This mostly works, but it would be nice if these operations were inherently less blocky to the rest of processing. (Or, this could all be side-stepped by allowing an existingUDPTransmit
to be reassociated with a new socket / different address after creation)The possibility obviously remains that I'm doing something dumb somewhere. I am using an LWA-OVRO fork of bifrost -- https://github.com/realtimeradio/bifrost/tree/474d87016b9a7a3d17f5dd83bf8e3b976c987f4a