Open Pitplatsch opened 1 year ago
I am not really an expert on the torch aspects here, but I'll try to answer:
Is this a known issue?
I wasn't aware and I wonder why. It should only load X and Z and then Y and Z into the memory which shouldn't be much.
I am of the impression that this is due to the many copies of the data loaded onto the GPU for conditional independence testing. Can this be avoided, or are routines necessary to clean the data from the GPUs after conditional independence is calculated?
I would hope that the data is cleaned from the GPU, but I don't know whether it does.
Or is pre-computing the null-distributions a solution to the problem?
No, the null distribution only pertains to distance correlation and this is computed on the CPU.
Not sure this helps, but I would welcome any improvements on the torch part. This is the _get_single_residuals
function in the class.
I am facing the same problem, have gpu with 8 cores.
""" Number of devices: 8 -- Kernel partition size: 0 Number of devices: 8 -- Kernel partition size: 42625 Number of devices: 8 -- Kernel partition size: 21313 Number of devices: 8 -- Kernel partition size: 10657 Number of devices: 8 -- Kernel partition size: 5329 Number of devices: 8 -- Kernel partition size: 2665 Number of devices: 8 -- Kernel partition size: 1333 Number of devices: 8 -- Kernel partition size: 667 Number of devices: 8 -- Kernel partition size: 334 Number of devices: 8 -- Kernel partition size: 167 Number of devices: 8 -- Kernel partition size: 84 Number of devices: 8 -- Kernel partition size: 42 Number of devices: 8 -- Kernel partition size: 21 Number of devices: 8 -- Kernel partition size: 11 Number of devices: 8 -- Kernel partition size: 6 Number of devices: 8 -- Kernel partition size: 3 torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 27.07 GiB (GPU 0; 79.19 GiB total capacity; 57.58 GiB already allocated; 4.62 GiB free; 57.59 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF"""
I do not understand what shall I do? I even tried to clear the cache using 'torch.cuda.empty_cache()'. But it doesn't help. Please help
I ran into the same problem, my dataset shape is (20000, 4), and using GDPCtorch() requires 305G of memory allocation, which is not usable at all. I run the PCMCI with max lag eq 1。
I had the same problem. About 980G of the memory allocation is required.
I have found a solution that is currently missing multi GPU support (using pytorch lightning). I will add this to achieve feature parity with the current version, and then do a pull request.
I am currently exploring the use of GPDC for causal discovery in tigramite using the pytorch implementation for increased speed for discovery of long time series with many variables and large tau_max. However, I run out of VRAM, even using modern GPUs (in my case: A100 with 80 GB of VRAM). See minimal working (crashing) example below for the technical details.
Is this a known issue? I am of the impression that this is due to the many copies of the data loaded onto the GPU for conditional independence testing. Can this be avoided, or are routines necessary to clean the data from the GPUs after conditional independence is calculated? Or is pre-computing the null-distributions a solution to the problem?
Thank you so much for your assistance and keep up the great developing work on tigramite!
Example
Example of overloading VRAM during causal discovery using GPDCtorch. This is done by 3650 timesteps of 5 variables with taum_max = 7
Test run on a single Nvidia A100 WITH 80 GB of VRAM
Process crashes after less than 5 minutes.
Software used
tigramite 5.2.0.4 gpytorch 1.10 pytorch 2.0.1 py3.10_cuda11.7_cudnn8.5.0_0
Code
output short
output complete