wpilibsuite / allwpilib

Official Repository of WPILibJ and WPILibC
https://wpilib.org/
Other
1.08k stars 611 forks source link

Apparent memory leak in Java projects, likely in NetworkTables library #5316

Closed nihonjinrxs closed 10 months ago

nihonjinrxs commented 1 year ago

Describe the bug We've seen a significant climb in memory use on the RIO (tested on a RoboRIO 1) after code start when pushing a newly created empty project to a robot with something sending NetworkTables traffic (in our case, a Limelight 2).

In our case, the empty project, when run on the RIO, started at about 211MB and climbed to 218MB used within about 5 minutes with 2 Limelights sending traffic. (More devices would add more entries and make that climb faster.)

If there are further things we can do to help with testing, please let us know. Happy to do so.

To Reproduce Steps to reproduce the behavior:

  1. Find some old robot to test with a RoboRIO on it, and with some device (Limelight, PhotonVision, etc.) connected that will send NetworkTables traffic.
  2. Generate a new WPILib Java project (we used the one without comments) via the VSCode command palette using mostly defaults (we used our team number 2992 and set the path where the project should be generated).
  3. Flash firmware and format a RoboRIO to have a clean slate starting point.
  4. Deploy the new project to the robot via the VSCode WPILib command or directly via Gradle.
  5. Shell into the RIO and run top to watch the memory climb.

Expected behavior Memory use is consistent during runtime for an empty project, even with NetworkTables traffic happening.

Screenshots None.

Desktop (please complete the following information):

Additional context We saw this earlier in the season in with our robot code, but weren't sure the cause, so we ended up removing a Limelight from our robot and turning our network switch off to prevent NetworkTables traffic to mitigate this for matches during this season. We've now replicated this on an empty project and wanted to submit this as a bug now.

sciencewhiz commented 1 year ago

More details here: https://www.chiefdelphi.com/t/fatal-robot-code-crash/427074/48

sciencewhiz commented 1 year ago

Out of curiosity, have you tried with photonvision running on the limelights? Limelight SW used NetworkTables 3 still, while PhotonVision uses NT 4.

amichaelyu commented 1 year ago

Out of curiosity, have you tried with photonvision running on the limelights? Limelight SW used NetworkTables 3 still, while PhotonVision uses NT 4.

Our team ran two PhotonVision cameras, we still found the crashes to be happening. Don't know if it's a result of the large amount of NT traffic being processed by the robot code and the Java garbage collector cleaning up fast enough.

nihonjinrxs commented 1 year ago

Wondering if any progress has been made on this one. Looking forward to the season start in January, I'd love if this was resolved before then. Let me know if there's anything I can do to help make that easier or move it along. Thanks!

PeterJohnson commented 1 year ago

I've implemented several improvements to NT that may have helped with this (in particular #5485), but I haven't specifically tried to test for this memory leak. If you can test with the beta (I recommend waiting for beta-2 which should be released tomorrow) that would be helpful.

nihonjinrxs commented 10 months ago

@PeterJohnson Team 2992 (@GeradAlt-F4, @msonnier, and myself) spent some time at build meeting this evening replicating our testing with the latest release, v2024.1.1-beta-4. Reporting back our findings here.

Testing configuration

We followed the replication process as specified above in the description of this bug report, with the clean slate project built on the new version of the library, and the RoboRIO 1 imaged to the corresponding image. We tested 2 different configurations, watching memory and CPU use via SSH/top on the RIO for a duration of 10 minutes on each run. For NetworkTables traffic, as before, we connected Limelight (2/2+) devices and configured them for the RoboRIO's network in AprilTags mode.

Test Results

Results were very good.

With 2 LimeLights connected, we saw memory sit at 246 MB and CPU hover between 5 and 7% for the duration of the test.

With 3 LimeLights connected, we saw memory sit at 247 MB and CPU hover between 5 and 8% for the duration of the test.

No observable memory growth happened with either configuration during testing.

Conclusion

For our purposes and from what we can observe, this seems fixed. If not, it's at least massively improved from before. I'll leave a decision on closing this ticket to you, as I'm not sure what other reports you may be working from. For my part, this can be closed.

Thanks for your continued work on this library.

PeterJohnson commented 10 months ago

Thanks for testing and reporting back! Closing as the issue appears to be resolved.