rapidsai / cusignal

cuSignal - RAPIDS Signal Processing Library
Other
717 stars 132 forks source link

[BUG] Linux OOM killer reaps cuSignal process on a Jetson Xavier NX #341

Open znmeb opened 3 years ago

znmeb commented 3 years ago

Describe the bug When running the E2E_Examplenotebook with cuSignal 0.18.0 on a Jetson Xavier NX (8 GB of RAM), JupyterLab reports a kernel restart running "Run Periodogram with Flattop Filter over Each Row of Ensemble" on the GPU.

Steps/Code to reproduce bug

  1. Install cuSignal in a Conda environment as described in the documentation
  2. Install JupyterLab in the same environment
  3. Start JupyterLab, browse to the notebooks and open E2E_Example.
  4. Run the cells one at a time from the top down until it crashes.

Expected behavior All cells run

Environment details (please complete the following information):

Additional context Linux behaves gracelessly if you ask it for more RAM than it has available. It will either thrash / catastrophic swap, rendering the system unresponsive, or kill processes of its own choosing. Neither will allow you to progress.

In this case it appears it chose the second option - here's a journalctl log:

journal.txt

This isn't mission-critical for me; I can page through the notebooks and find cases that will run. I doubt if I'll run into something like this in my application code.

awthomp commented 3 years ago

Hey @znmeb -- Does the notebook work if you just reduce the size of the signal ensemble generated on GPU at the beginning of the notebook?

znmeb commented 3 years ago

I haven't tried that yet - I did that with some other notebooks a few months ago and it worked, so my guess is it will work. Which variable controls the size?

github-actions[bot] commented 3 years ago

This issue has been labeled inactive-30d due to no recent activity in the past 30 days. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed. This issue will be labeled inactive-90d if there is no activity in the next 60 days.

github-actions[bot] commented 3 years ago

This issue has been labeled inactive-90d due to no recent activity in the past 90 days. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed.