Closed tmckayus closed 2 years ago
I tried the same script on several systems with physcial gpus and the issue did not reproduce.
Just a note, made the timing more granular and the increase in time is in the dataframe creation line, not the import.
df = cudf.DataFrame({"cat": "dog"})
Thanks @tmckayus .
That's interesting. I wonder if the CUDA context initialization is what takes so long. Just to narrow the problem a bit, do you see the same behavior when trying to create a cupy
array rather than a cuDF dataframe?
@shwina thanks I will try that and post back
@shwina here is an update
I discovered that on the test system (using A30-24C vGPU) there is a time component to the failure. Empirically, it seems that the system must be up for at least 20 minutes in order for the failure to be observed. After 20 minutes, it takes about 30 creations in a loop (either a cupy array or a cudf dataframe) to get the system in a state where the increased time can be observed. Experimenting with a 15 minute interval instead of 20 minutes yielded a case where ~300 iterations with a 1 second sleep were needed to observe the same increased time (and of course that is an additional 5 minutes, hence consistent with the 20 minute sleep time).
I've attached a new zipfile with updated reproducers. Description of the file contents:
bug_cudf_reproducer.py -- simple Python program that creates a dataframe in a loop and reports the import time for cudf (doesn't appear to ever be an issue) and time for the creation. An optional iteration count can be passed, default is loop until the failure threshold is crossed 3 times. The failure threshold is 1 second for a dataframe creation.
bug_cupy_reproducer.py -- simple Python program that creates a cupy array in a loop and reports the import time for cupy (doesn't appear to ever be an issue) and time for the creation. An optional iteration count can be passed, default is loop until the failure threshold is crossed 3 times. The failure threshold is 200 milliseconds for a cupy array creation.
time_test_cudf.sh -- bash script that runs bug_cupy_reproducer.py and bug_cudf_reproducer.py each 4 times to establish a baseline, then sleeps for 20 minutes and runs bug_cudf_reproducer.py until failure. Finally, it runs bug_cupy_reproducer.py and bug_cudf_reproducer.py again each 4 times to show that the degraded time is consistent across processes, and also that there is a very large initial delay in the first iteration after a process switch. These effects will persist until the system is rebooted.
time_test_cupy.sh -- bash script that runs bug_cupy_reproducer.py and bug_cudf_reproducer.py each 4 times to establish a baseline, then sleeps for 20 minutes and runs bug_cupy_reproducer.py until failure. Finally, it runs bug_cudf_reproducer.py and bug_cupy_reproducer.py again each 4 times to show that the degraded time is consistent across processes, and also that there is a very large initial delay in the first iteration after a process switch. These effects will persist until the system is rebooted. This script illustrates that it doesn't matter whether cupy or cudf is used in the main iteration after the sleep, either one will induce a failure that is consistent across both programs.
time_test_cudf.log -- output from a time_test_cudf.sh run. Note in particular:
time_test_cudf_skip_initial.log -- a second run, with calls to establish initial baseline times removed and beginning with the 20m delay. This was done on a newly rebooted system to demonstrate that it is not the initial calls to the cudf/cupy libraries that catalyzed the error, it happens shortly after the 20m delay (you know, Heisenberg :) ). The rest of the log is consistent with the first log above.
time_test_cupy.log -- essentially the same scenario, but using cupy as the main iteration instead of cudf to incur the degradation in time to illustrate that the mechanism doesn't matter. Note that the behaviors are consistent: it takes about 30 iterations of cupy array creation to notice the time degradation after a 20 minute sleep, and then the rest of the degraded times for execution and process switches are consistent.
This turned out to be a license issue -- if the vgpu is not correctly licensed, performance degrades after 20 minutes as noted here :)
Describe the bug The execution time of a simple Python program that imports cudf and creates a dataframe will eventually increase to over 5 minutes when run multiple times on a system using the vGPU driver.
Steps/Code to reproduce bug
Expected behavior Execution time should be minimal, and remain minimal (about 1 or 2 seconds)
Environment overview (please complete the following information)
Environment details Please run and paste the output of the
cudf/print_env.sh
script here, to gather any other relevant environment detailsClick here to see environment details
Additional context The script used to install the vgpu driver was ngc resource nvaie/vgpu_guest_driver:470.63.01-ubuntu20.04
reproducer.zip