Open jaycedowell opened 2 years ago
Attention: Patch coverage is 67.88618%
with 395 lines
in your changes are missing coverage. Please review.
Project coverage is 67.77%. Comparing base (
4fadfa5
) to head (0c8f19b
). Report is 849 commits behind head on master.:exclamation: Current head 0c8f19b differs from pull request most recent head a360f0b. Consider uploading reports for the commit a360f0b to get more accurate results
:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.
Working on the mac support reminded me of something. When I first started working on this new interface I had a "sniff" mode which used linux's raw packet interface with a promiscuous NIC to capture packets destined for a different host. I originally did this because that is how we were running one of our data modes at LWA1. I've since moved on to multicast and I now wonder if "sniff" is still a feature worth having.
There is only one failure in test_udp_io
for MacOS now but there are intermittent failures for Ubuntu. That's strange. I'm not sure if this is really better or not.
MacOS is better now but there still seems to be intermittent problems. They sometimes happen when I test locally but it isn't clear why.
Just a note that I'm able to reproduce this error on my little MacBook Air:
FAIL: test_read_drx_single (test_udp_io.UDPIOTest)
----------------------------------------------------------------------
Traceback (most recent call last):
File "/Users/league/p/bifrost/test/test_udp_io.py", line 408, in test_read_drx_single
np.testing.assert_equal(final[i,...], data[i,...])
File "/nix/store/ycp9cnz64wf6xm0f5i1giwdacj6v89n5-python3.8-numpy-1.21.2/lib/python3.8/site-packages/numpy/testing/_private/utils.py", line 345, in assert_equal
return assert_array_equal(actual, desired, err_msg, verbose)
File "/nix/store/ycp9cnz64wf6xm0f5i1giwdacj6v89n5-python3.8-numpy-1.21.2/lib/python3.8/site-packages/numpy/testing/_private/utils.py", line 934, in assert_array_equal
assert_array_compare(operator.__eq__, x, y, err_msg=err_msg,
File "/nix/store/ycp9cnz64wf6xm0f5i1giwdacj6v89n5-python3.8-numpy-1.21.2/lib/python3.8/site-packages/numpy/testing/_private/utils.py", line 844, in assert_array_compare
raise AssertionError(msg)
AssertionError:
Arrays are not equal
Mismatched elements: 3871 / 8192 (47.3%)
x: ndarray([[( 65,), (-46,), (-18,), ..., (-49,), ( 63,), ( 18,)],
[( 65,), (-47,), ( -2,), ..., (-34,), ( 79,), ( 2,)]],
dtype=[('re_im', 'i1')])
y: ndarray([[( 64,), (-30,), (-34,), ..., (-49,), ( 62,), ( 34,)],
[( 49,), (-63,), ( -2,), ..., (-34,), ( 79,), ( 2,)]],
dtype=[('re_im', 'i1')])
Is that the only Mac error you're seeing on this branch now?
That's interesting. I just pulled down the recent changes and gave it a shot on my laptop again. The first time I ended up with no failures but this warning:
test_read_multicast (test_udp_io.UDPIOTest) ... Exception in thread Thread-6:
Traceback (most recent call last):
File "/opt/homebrew/Cellar/python@3.9/3.9.13_1/Frameworks/Python.framework/Versions/3.9/lib/python3.9/threading.py", line 980, in _bootstrap_inner
self.run()
File "/opt/homebrew/Cellar/python@3.9/3.9.13_1/Frameworks/Python.framework/Versions/3.9/lib/python3.9/threading.py", line 917, in run
self._target(*self._args, **self._kwargs)
File "/Users/jaycedowell/CodeSafe/bifrost/test/test_udp_io.py", line 179, in main
for iseq in self.ring.read(guarantee=True):
File "/Users/jaycedowell/CodeSafe/bifrost/python/bifrost/ring.py", line 116, in read
with ReadSequence(self, which=whence, guarantee=guarantee) as cur_seq:
File "/Users/jaycedowell/CodeSafe/bifrost/python/bifrost/ring.py", line 255, in __init__
_check(_bf.bfRingSequenceOpenEarliest(self.obj, ring.obj, guarantee))
File "/Users/jaycedowell/CodeSafe/bifrost/python/bifrost/libbifrost.py", line 117, in _check
raise EndOfDataStop('BF_STATUS_END_OF_DATA')
bifrost.libbifrost.EndOfDataStop: BF_STATUS_END_OF_DATA
ok
Second run did not show that warnings and there we still no failures. Third run failed with:
======================================================================
FAIL: test_read_multicast (test_udp_io.UDPIOTest)
----------------------------------------------------------------------
Traceback (most recent call last):
File "/Users/jaycedowell/CodeSafe/bifrost/test/test_udp_io.py", line 545, in test_read_multicast
np.testing.assert_equal(final[i,...], data[i,...])
File "/opt/homebrew/lib/python3.9/site-packages/numpy/testing/_private/utils.py", line 345, in assert_equal
return assert_array_equal(actual, desired, err_msg, verbose)
File "/opt/homebrew/lib/python3.9/site-packages/numpy/testing/_private/utils.py", line 934, in assert_array_equal
assert_array_compare(operator.__eq__, x, y, err_msg=err_msg,
File "/opt/homebrew/lib/python3.9/site-packages/numpy/testing/_private/utils.py", line 844, in assert_array_compare
raise AssertionError(msg)
AssertionError:
Arrays are not equal
Mismatched elements: 15872 / 16384 (96.9%)
x: ndarray([[( 47, -10), (-23, 27), (-13, -29), ..., ( 7, 30),
(-38, -19), ( 50, -2)],
[( 7, 0), ( 7, 0), ( 7, 0), ..., ( 7, 0),...
y: ndarray([[( 47, -10), (-23, 27), (-13, -29), ..., ( 7, 30),
(-38, -19), ( 50, -2)],
[( 50, 4), (-41, 17), ( 10, -29), ..., (-16, 28),...
----------------------------------------------------------------------
Ran 10 tests in 14.530s
FAILED (failures=1)
Fourth run also failed as the same place. Fifth , sixth, and seventh runs were all ok. Eighth failed. I don't really get it.
Oof. I've seen the EndOfDataStop('BF_STATUS_END_OF_DATA')
intermittently too, on Linux, although the test apparently passes. On Mac, it seems that I reliably fail test_read_drx_single
. Earlier today, I think I saw that one fail on Linux too, but it was only once. test_read_multicast
hasn't failed for me before.
I guess I've been gravitating toward a timing/race condition type of hypothesis. I notice in the send loops in the tests, there's a time.sleep
. Is it just acting as a yield to give the other thread(s) a chance to run? But the buffer should be blocking… right? Both on write-when-full or read-when-empty? Or are we non-blocking and assuming the size of the buffer and thread fairness will make it mostly okay. I was going to play with the amounts of data (and/or size of buffer) to see if I can get it to fail more or less often.
I usually use a time.sleep
in these contexts to get a more realistic packet/data rate coming out but these look like they might be being used as a crude rate limiter.
As for blocking/non-blocking the only thing that should block would be the ring that connects something like DRXReader
with AccumulateOp
. Each packet sending call should be returning as soon as the packets are handed off to the kernel. Your race condition idea might be on track. If you look at my last failure the first few elements in the comparison seem to be ok and then the end is all (7, 0).
~FWIW I tried to increase the sleeps in the test suite to 10 ms but that didn't change anything for me. test_read_multicast
still fails.~
Nevermind, I seem to have only changes a few of the sleep to 10 ms.
Going to 5 ms for sleep and changing the multicast address to 224.0.0.251 might have done something. I've had eight successful tests and no failures.
Bummer. Interesting that the CI fails on a test that I haven't had a problem with locally.
Yeah I was just wondering if a CI environment is more or less consistent/reliable when it comes to timing issues. Or neither, but just different. FWIW, I just pulled in the 5ms change, ran it on Linux, and still got EndOfDataStop
on test_read_multicast
– is this essentially a ring-buffer underflow happening?
test_read_drx (test_udp_io.UDPIOTest) ... ok
test_read_drx_single (test_udp_io.UDPIOTest) ... ok
test_read_multicast (test_udp_io.UDPIOTest) ... Exception in thread Thread-6:
Traceback (most recent call last):
File "/nix/store/wl02plhc6zf84m6x9984l42wnnnbly5m-python3-3.9.6/lib/python3.9/threading.py", line 973, in _bootstrap_inner
self.run()
File "/nix/store/wl02plhc6zf84m6x9984l42wnnnbly5m-python3-3.9.6/lib/python3.9/threading.py", line 910, in run
self._target(*self._args, **self._kwargs)
File "/home/league/p/bifrost/test/test_udp_io.py", line 179, in main
for iseq in self.ring.read(guarantee=True):
File "/nix/store/zk2rimkn7wdj51dsncsq78f7ynbwbz39-python3-3.9.6-env/lib/python3.9/site-packages/bifrost/ring.py", line 116, in read
with ReadSequence(self, which=whence, guarantee=guarantee) as cur_seq:
File "/nix/store/zk2rimkn7wdj51dsncsq78f7ynbwbz39-python3-3.9.6-env/lib/python3.9/site-packages/bifrost/ring.py", line 255, in __init__
_check(_bf.bfRingSequenceOpenEarliest(self.obj, ring.obj, guarantee))
File "/nix/store/zk2rimkn7wdj51dsncsq78f7ynbwbz39-python3-3.9.6-env/lib/python3.9/site-packages/bifrost/libbifrost.py", line 117, in _check
raise EndOfDataStop('BF_STATUS_END_OF_DATA')
bifrost.libbifrost.EndOfDataStop: BF_STATUS_END_OF_DATA
ok
test_read_pbeam (test_udp_io.UDPIOTest) ... ok
test_read_tbn (test_udp_io.UDPIOTest) ... ok
test_write_drx (test_udp_io.UDPIOTest) ... ok
test_write_drx_single (test_udp_io.UDPIOTest) ... ok
test_write_multicast (test_udp_io.UDPIOTest) ... ok
test_write_pbeam (test_udp_io.UDPIOTest) ... ok
test_write_tbn (test_udp_io.UDPIOTest) ... ok
----------------------------------------------------------------------
Ran 10 tests in 32.492s
If I remember correctly EndOfDataStop is raised when you've reached the end of a sequence and there isn't a new sequence to read from.
Another thing interesting about the CI failures is that it looks like the data at the end of the array are ok. We're already skipping the first frame worth of data in the comparison. Would skipping the first two do anything? Is this mis-match along a tuning boundary instead?
So much for that idea. It's really hard to test this when it now only shows up on CI for me.
Also, maybe we should do something about all those numpy deprecation warnings someday.
What?
Yeah, I'm failing to understand something. I think this mod is related to what your expansion did:
## Reduce to match the capture block size
print(final.shape, data.shape) # (147, 2, 4096) (128, 4, 4096)
data = data[:final.shape[0],:2,:]
final = final[:data.shape[0],:2,:]
self.assertEqual(final.shape, data.shape) # they are same shape
print(final.shape) # (128, 2, 4096)
# Loop passes for each i>0 in first axis
for i in range(1, data.shape[0]):
print(i)
np.testing.assert_equal(final[i,...], data[i,...])
# But slice of first axis fails
np.testing.assert_equal(final[1:,...], data[1:,...])
I use 2 slice statements to ensure same shape. Then if I loop 1..127 and assert equal, they all pass.
But a slice fed to assert_equal, which I thought should be effectively the same as the loop, fails: Mismatched elements: 40 / 1040384 (0.00384%)
What if you throw in a couple of copies? Say:
data = data[:final.shape[0],:2,:]
becomes
data = data[:final.shape[0],:2,:].copy()
?
I just got a failure for this more recent test suite on test_read_drx
. Maybe it's not fixed.
Added a reference to #196 so that we know to update the documentation.
When I make it through https://github.com/ledatelescope/bifrost/issues/226 I'll convert this to a real PR.
This PR provides a unified packet capture interface that makes it easier to add new packet formats to Bifrost and to read packet formats from the network (unicast or multicast) or from disk. This also includes a generalization of the packet writer interface. This PR changes the Bifrost API in a couple of ways:
bifrost.udp_capture.UDPCapture is now bifrost.packet_capture.UDPCapture there is a new PacketCaptureCallback class that wraps the callbacks bifrost.udp_transmit is now bifrost.packet_writer
This should also address the segfault I was running into in #137.