ni / niveristand-ballard-milStd1553-custom-device

Custom device for Ballard MIL-STD-1553 hardware
MIT License
2 stars 4 forks source link

Serious timing issues #209

Open Schlammkuh opened 2 years ago

Schlammkuh commented 2 years ago

The late counter for the VeriStand Primary Control Loop is incrementing literally every iteration for me when running this Custom Device.

My environment: PXIe-8840 QC running NI Linux RT 21.5 astronics-ballard-btipci_2.10.0.1_core2-64.ipk astronics-ballard-labview_1.3.2.2_core2-64.ipk

LabVIEW 2020 / VeriStand 2020 R6 Custom Device: latest release (v20.7.0.17) - using the prebuilt .nipkg versions.

PCL target rate is set to 1kHz. But I'm getting only about 333Hz. One core is at 100% constantly according to htop. Hyperthreading is disabled, so 4 cores available. Logging is disabled in System Explorer. I was using the provided example configuration (1553_Hardware.xml / 1553_VS_Configuration.xml).

I have not tested any further.

Karl-G1 commented 2 years ago

I have not seen this be an issue at PCL rates of 1kHz in our internal testing. Can you give more details on your setup to see where the difference could be (such as by uploading the system definition file here)?

For context, I deployed the same configuration (using VeriStand 2020 R6 and the example configuration files) to a PXIe-8880 running both the 21.0 system image and the 21.5 system image. The 8840QC with 4 cores enabled should be no slower than the 8880 when using only one custom device. I am seeing low jitter on the PCL, and only about 30% of one core used by the PCL which includes the inline logic of the custom device and handles all of the reading and decoding of messages.

The remaining CPU usage contains other VeriStand logic plus the asynchronous part of the custom device which handles writing the transmitted messages to the driver. We are looking to reduce this CPU usage by only writing values when the data changes instead of continuously. However, it is a low priority thread that should not block the PCL even if executing on the same core.

Schlammkuh commented 2 years ago

Sry for the delay - I've been on vacation. I've dug a little deeper here and found out that the high CPU load I'm experiencing is caused by interference from the other Ballard Custom Device (ARINC429 - https://github.com/ni/niveristand-ballard-arinc429-custom-device). I tested with a stripped-down System Definition File with the following results:

ARINC429 Custom Device only / no other Custom Devices - OK MIL1553 Custom Device only / no other Custom Devices - OK ARINC429 Custom Device plus other Custom Devices, except for the MIL1553 Custom Device - OK MIL1553 Custom Device plus other Custom Devices , except for the ARINC429 Custom Device - OK ARINC429 Custom Device plus MIL1553 Custom Device / no other Custom Devices - high CPU load incl. lots of late iterations

Attached is a copy of a stripped-down System Defition I used for testing GOLD-System FFB.zip .

Karl-G1 commented 2 years ago

Thanks for the additional testing notes! I was able to reproduce this today as well. I tried across multiple configurations that I'll document here: 2x ARINC 429 Custom Devices (two cores on the same module) - OK 2x ARINC 429 Custom Devices (two separate modules) - OK 2x MIL1553 Custom Devices (two separate modules) - OK MIL1553 + ARINC 429 (two cores on the same module) - limited to ~400Hz PCL rate MIL1553 + ARINC 429 (two separate modules) - limited to ~400Hz PCL rate

This appears to be a driver issue when using the two Custom Devices together. Our R&D team will investigate.

My testing notes: I built a few different versions of the MIL1553 Custom Device's Engine PPL to see where the issue is coming from. For a quick test, I disabled all BTI driver calls from the Tx and Rx Execution units, and the performance issue went away. I then added back in everything one at a time, and when I re-enabled the driver's message reads (Get BC/RT Xfer Last Message Block.vi), the performance issue came back.

Karl-G1 commented 2 years ago

@Schlammkuh I debugged this issue more last week, and I found the root cause. There is a massive increase in execution time when reading 1553 messages via Transfer Handle when a 429 device is initialized in the same system. In the simple test case you can see below, all 1553 transfer handle reads slow down by over 10x (roughly 17us to 200us) when the 429 initialization is enabled.

429 debug

Given that even our example Hardware XML file has 10x message reads per iteration, this limits VeriStand's max execution rate to less than 500Hz. This matches exactly what we were seeing in the VeriStand test results earlier. Unfortunately, this is a driver problem and not the result of the custom device implementation. We have contacted Ballard to discuss options.

One option would be to read from the Bus Monitor instead of individual message transfer handles. This is how the 429 CD is implemented: all received messages are read, decoded, and then matched up with available channels in the system definition. We moved away from that implementation on this custom device, as the 'single-function' core modules are not able to run the Bus Monitor.

bariskarkar commented 2 years ago

Was a solution found? I have the same problem.

Schlammkuh commented 2 years ago

The workaround for me was to lower the PCL loop rate from 1kHz to 300Hz, and this works for about half a year now w/o issues. But this MAY be not an option to you, depending on your application.

bariskarkar commented 2 years ago

Unfortunately this is not an option for me. I need a 3 channel MIL-STD-1553. So I have to add 2 MIL-STD-1553 custom devices for core 0 and core 1. When I do that actual PCL Loop rate getting down to 167Hz. That's too slow for my system.

Schlammkuh commented 2 years ago

Hmm, I see. You could try the new NI Linux RT release (2022 Q3) if this helps somehow. Otherwise, as @Karl-G1 said above, its a driver issue - not a Custom Device issue, so Ballard has to update their drivers (no newer drivers have been released since my initial comment, I just checked that on the NI feed http://download.ni.com/#ni-linux-rt/feeds/2022Q3/ni-third-party/). I'm afraid there's not much you can do here to accelerate the solution finding. I would contact my NI account manager and/or application engineer to persuade NI to make Ballard work harder on that issue.

Sebastian

bariskarkar commented 2 years ago

I don't think it will solve my problem because as you said, Ballard driver version is not change with Linux RT 2022 Q3(22.5). Just for a hopeless try, I don't want to change my Linux version to 22.5 because the whole system is based on 21.8. Doing this means formatting the entire system and I'm worried about this will cause new problems. For now I am in contact Ballard and the local NI representative and am awaiting a response.

Karl-G1 commented 2 years ago

@bariskarkar The root issue with the driver has not been fixed as far as I know. More reports of the issue with timing should expedite investigation, so thank you for following up with both Ballard and NI.

To give users more flexibility with timing (understanding that this custom device takes a long time to execute), we have added two timing options in the 21.3 release as of this week:

We also added optional Timing Channels to the custom device so you can analyze per-instance performance of each 'execution unit' of the custom device engine. I have not updated this repo with a Theory of Operations guide, but we have a similar explanation for the ARINC 429 custom device you can find here. Most of the timing characteristics are similar enough in theory that the concepts should be the same.

bariskarkar commented 2 years ago

@Karl-G1 Asynchronous Rx execution is better. When I enable asynchronous rx execution the rx execution time is about 2500 us regardless of the number of custom devices. I have a question about theory of operation. When I add 2 MIL-STD-1553 custom devices, do they running in parallel like image below or they running in serial? If they are running in parallel what's the disadvantages of this?

Async

I have one more question about Decimation. How exactly does it work and what are its advantages?

By the way, Ballard replied to my message as follows. If I want to do this, how can I implement it to your custom device?

"We believe the best solution is using the monitor in the 1553 custom device just like is done in the 429 CD. This will require multi-function 1553 hardware as noted in the github discussion. You have multi-function 1553 hardware, so this would be the best path forward in your case to get improved loop rates.

HIL and Veristand are about full hardware simulation. This is best accomplished using multi-function channels for 1553. Restricting performance in order to support single-function channels would only make sense if a full performance custom device was also available specifically for multi-function channels at higher loop rates."

Sorry for too many questions. I don't have much experience with custom devices and Versitand.

Karl-G1 commented 2 years ago

@bariskarkar Sorry for the slow reply on my end. Here are my thoughts:

bariskarkar commented 1 year ago

@Karl-G1 Have you had the time to work on this issue? I really need to solve this.