Closed hidasib closed 3 years ago
In your first section of code, CuPy is using its own memory allocator which by default uses a memory pool. You can read more about it here, but the key point is that using a memory pool significantly increases performance in this example.
When you import cudf, the memory allocator for CuPy is changed to instead use the RAPIDS Memory Manager. This is usually a good decision, as we ideally want to use the same available memory pool so we don't compete for memory between cuDF and CuPy. However, by default, cuDF does not use a memory pool.
If you run
import cudf
cudf.set_allocator(pool=True)
rather than just importing cudf, things should go fast again. This will create a memory pool sized to about half of your GPU's total memory.
I'm going to close this issue as answered, but please feel free to re-open if things don't work as expected!
Importing cuDF at any point significantly slows down cuPy operations (even if cuDF is not used). I think that this is an issue with cuDF rather than cuPy, but let me know if I should post it to the cuPy github instead.
Code to reproduce the error:
Output:
The slowdown is not limited to the function above or matrix multiplication (e.g. the time of argsorting 2M floats goes from 773 us to 2780 us). Its magnitude depends on the executed function and the size of the cuPy arrays used. I've seen everything between +10% to +1000%.
The same can be observed if I use two separate scripts for profiling (one with only cuPy imported and the other with both cuDF and cuPy imported - the order of the imports doesn't matter).
Environment overview (please complete the following information)
Environment details
Click here to see environment details