lwa-project / data_recorder

Data Recorder (DR) Monitor and Control Software
GNU General Public License v3.0
0 stars 0 forks source link

DR Spectrometer at LWA-NA leads to 100% buffer usage #2

Open jaycedowell opened 8 months ago

jaycedowell commented 8 months ago
024-01-12 22:27:36.000 [I] [FileWriter:/ ...] [SpectrometerOperation] FileWriter record started
2024-01-12 22:27:36.572 [I] 
                        [I] [Receiver        ] Buffers:     [______________________________] (  0%)  3.99976 MiB
                        [I] [Receiver        ] Rate:        [______________________________] (  0%)  148.12305 KiB/s
                        [I] [Receiver        ] Subscribers: 1
                        [I] [Receiver        ] Format:      DRX_FILT_7
2024-01-12 22:27:37.530 [I] 
                        [I] [Receiver        ] Buffers:     [______________________________] (  0%)  3.99976 MiB
                        [I] [Receiver        ] Rate:        [####################__________] ( 66%)  79.32710 MiB/s
                        [I] [Receiver        ] Subscribers: 1
                        [I] [Receiver        ] Format:      DRX_FILT_7
2024-01-12 22:27:37.833 [E] [Plugin:DrxSp ...] INCOMPATIBLE
2024-01-12 22:27:42.518 [I] 
                        [I] [Receiver        ] Buffers:     [#####_________________________] ( 17%)  351.97852 MiB
                        [I] [Receiver        ] Rate:        [###################___________] ( 65%)  78.41090 MiB/s
                        [I] [Receiver        ] Subscribers: 1
                        [I] [Receiver        ] Format:      DRX_FILT_7
2024-01-12 22:27:47.527 [I] 
                        [I] [Receiver        ] Buffers:     [##########____________________] ( 35%)  727.95557 MiB
                        [I] [Receiver        ] Rate:        [##################____________] ( 62%)  74.84494 MiB/s
                        [I] [Receiver        ] Subscribers: 1
                        [I] [Receiver        ] Format:      DRX_FILT_7
2024-01-12 22:27:52.529 [I] 
                        [I] [Receiver        ] Buffers:     [################______________] ( 53%)  1.07415 GiB
                        [I] [Receiver        ] Rate:        [##################____________] ( 61%)  74.22483 MiB/s
                        [I] [Receiver        ] Subscribers: 1
                        [I] [Receiver        ] Format:      DRX_FILT_7
2024-01-12 22:27:57.471 [I] 
                        [I] [Receiver        ] Buffers:     [#####################_________] ( 71%)  1.43741 GiB
                        [I] [Receiver        ] Rate:        [##################____________] ( 62%)  74.88780 MiB/s
                        [I] [Receiver        ] Subscribers: 1
                        [I] [Receiver        ] Format:      DRX_FILT_7
2024-01-12 22:28:02.474 [I] 
                        [I] [Receiver        ] Buffers:     [###########################___] ( 90%)  1.80848 GiB
                        [I] [Receiver        ] Rate:        [###################___________] ( 63%)  75.71530 MiB/s
                        [I] [Receiver        ] Subscribers: 1
                        [I] [Receiver        ] Format:      DRX_FILT_7
2024-01-12 22:28:03.700 [I] 
                        [I] [Plugin:DrxSp ...] ==========================================================================================
                        [I] [Plugin:DrxSp ...] == Spectrometer Report:                                                                 ==
                        [I] [Plugin:DrxSp ...] ==  Mode:        XXYY         Frequency Ch.   1024   Integration count     1536         ==
                        [I] [Plugin:DrxSp ...] ==  Frames                                                                              ==
                        [I] [Plugin:DrxSp ...] ==        30640                                       (received)                        ==
                        [I] [Plugin:DrxSp ...] ==        24677 /        21 /     24656               (inserted / init / join)          ==
                        [I] [Plugin:DrxSp ...] ==         5952 /         0 /        10              (incompatible / late1 / late2)     ==
                        [I] [Plugin:DrxSp ...] ==            4 /        17                           (fresh / stale )                  ==
                        [I] [Plugin:DrxSp ...] ==  Blocks                                                                              ==
                        [I] [Plugin:DrxSp ...] ==           11 /        11 /         2               (started / completed / dropped)   ==
                        [I] [Plugin:DrxSp ...] ==  Queues      (used / size)                                                           ==
                        [I] [Plugin:DrxSp ...] ==            0 /        10                           (free)                            ==
                        [I] [Plugin:DrxSp ...] ==            8 /       inf                           (filling)                         ==
                        [I] [Plugin:DrxSp ...] ==            0 /        10                           (startable)                       ==
                        [I] [Plugin:DrxSp ...] ==            0 /         3                           (processing)                      ==
                        [I] [Plugin:DrxSp ...] ==            0 /        18                           (dropped)                         ==
                        [I] [Plugin:DrxSp ...] ==                                                                                      ==
                        [I] [Plugin:DrxSp ...] ==  Runtime:        30 s          Compute B/W:    0.3667 Blk/s                          ==

"INCOMPATIBLE" might be a clue...

jaycedowell commented 8 months ago

The error message comes from:

https://github.com/lwa-project/data_recorder/blob/multi_dr/DROS2/Spectrometer/DrxSpectrometer.cpp#L510

based on info from:

https://github.com/lwa-project/data_recorder/blob/multi_dr/DROS2/Spectrometer/DrxSpectrometer.cpp#L448

That function looks at all the usual suspects: time, frequency, and bandwidth. It's not clear from the logs what doesn't match.

jaycedowell commented 8 months ago

This might be more of a buffer size issue when dealing with a big skew between the two polarizations. Maybe https://github.com/lwa-project/ng_digital_processor/tree/new_drx_packetizer would help?

jaycedowell commented 5 months ago

It's still happening. Maybe we really do need to go to new_drx_packetizer.

jaycedowell commented 5 months ago

Ah, I can trigger this if I send two DRX commands in rapid succession.

jaycedowell commented 5 months ago

Using a much larger buffer size (minimum of 64 blocks vs. the original value of 8) seems to help but that isn't a very satisfying solution. Maybe it's the ordering of the packets coming out of the T-engine. I should do a raw recording (and one at Sevilleta) to see what that is like.

jaycedowell commented 5 months ago

When I look at the DRX flow at North Arm vs. Sevilleta I think the North Arm flow is much more consistent with nice, uniform blocks of eight packets. Sevilleta, on the other hand, seems to have a lot more variation in the packet ordering. Based on that I would expect Sevilleta to be the problem station unless this issue is related to some kind of rare condition at North Arm.

jaycedowell commented 5 months ago

Since STP seems to work I've dropped back to the default block count limits to see if this is a viable solution.

jaycedowell commented 5 months ago

Nope, this is still happening. The stress test failed this morning as well as another test I ran this afternoon.

Update: Going back to larger buffers for testing (64 <= buffer count <= 256).