mlr: torch.cuda.OutOfMemoryError error with 20k genes & 20k CpGs

rituroy commented 1 year ago

pwd /songlab/proj/cbi/torch_ecpg/simulation (py310)

logname="_sim1" numGene="20000" numCpG="20000" numSample="1000" /usr/bin/Rscript simulateData.R "numGene="$numGene "numCpG="$numCpG "numSample="$numSample filename="log_mlr_gene"$numGene"_CpG"$numCpG$"_sample"$numSample$logname".txt" /usr/bin/time tecpg run mlr --p-only >"$filename" 2>&1

log_mlr_gene20000_CpG20000_sample1000_sim1.txt

kordk commented 1 year ago

Chunking should address this. Liam put together some guidanance for the selection of chunck sizes already. I've asked him to point us to it.

kordk commented 1 year ago

Found it. Liam addressed it here with some options in the code to chunk for you (https://github.com/kordk/torch-ecpg/issues/17#issuecomment-1366118363):

"Run tecpg chunks to get the maximum loci per chunk for a given target torch memory usage (default 80% of total memory)."

liamgd commented 1 year ago

Yes, chunking should solve the problem. If you run into this error again, reduce the number of loci per chunk. tecpg chunks should serve as a good reference, especially for GPU computation, but may need to be altered slightly.

rituroy commented 1 year ago

Not able to run mlr with 450k methylation loci even with . Running out of memory

tecpg chunks -F 0.05 -s 1000 -m 450000 -g 20000 -c 2 -P True -p True

[37m[INFO] CUDA GPU detected. This device supports CUDA.[0m [37m[INFO] Target memory not supplied. Inferred target of 12172.7090688 MB of CUDA memory (80% of detected)[0m [37m[INFO] Estimated loci per chunk for target peak memory usage of 12172709068.800001 bytes:[0m [37m[INFO] 14407200004 bytes for constants (without region filtration)[0m [37m[INFO] 86407200004 bytes for constants (with region filtration)[0m [37m[INFO] Full output, p only, p filtration, region filtration[0m [37m[INFO] False, True, True, False: Not possible, Peak memory after scalars and E[0m [37m[INFO] False, True, True, True: Not possible, Peak memory after results concatenation[0m [37m[INFO] True, True, True, False: Not possible, Peak memory after scalars and E[0m [37m[INFO] True, True, True, True: Not possible, Peak memory after results concatenation[0m

numGene="20000"; numCpG="450000"; numSample="1000"

tecpg run mlr --p-only --p-thresh 0.05 -l 10

[37m[INFO] CUDA GPU detected. This device supports CUDA.[0m [37m[INFO] Reading 3 dataframes...[0m [37m[INFOTIMER] Reading 1/3: C.csv[0m [37m[INFO] Reading csv file /data/songlab/proj/cbi/torch_ecpg/simulation/data/C.csv with separator ,[0m [37m[INFOTIMER] Read 1/3 in 0.0033 seconds[0m [37m[INFOTIMER] Reading 2/3: M.csv[0m [37m[INFO] Reading csv file /data/songlab/proj/cbi/torch_ecpg/simulation/data/M.csv with separator ,[0m [37m[INFOTIMER] Read 2/3 in 82.0209 seconds[0m [37m[INFOTIMER] Reading 3/3: G.csv[0m [37m[INFO] Reading csv file /data/songlab/proj/cbi/torch_ecpg/simulation/data/G.csv with separator ,[0m [37m[INFOTIMER] Read 3/3 in 3.8222 seconds[0m [37m[INFOTIMER] Finished reading 3 dataframes in 85.8467 seconds.[0m [37m[INFO] Initializing regression variables[0m [37m[INFO] Use CPU not supplied. Checking if CUDA is available.[0m [37m[INFO] Using CUDA[0m [37m[INFO] Running with 996 degrees of freedom[0m [37m[INFO] Initializing output directory[0m [37m[INFO] Removing directory /data/songlab/proj/cbi/torch_ecpg/simulation/output...[0m [37m[INFO] Creating directory /data/songlab/proj/cbi/torch_ecpg/simulation/output...[0m [37m[INFO] Running regression_full...[0m Traceback (most recent call last): File "/home/ritu/anaconda3/envs/py310/bin/tecpg", line 33, in sys.exit(load_entry_point('tecpg', 'console_scripts', 'tecpg')()) File "/home/ritu/anaconda3/envs/py310/bin/tecpg", line 25, in importlib_load_entry_point return next(matches).load() File "/home/ritu/anaconda3/envs/py310/lib/python3.10/importlib/metadata/init.py", line 171, in load module = import_module(match.group('module')) File "/home/ritu/anaconda3/envs/py310/lib/python3.10/importlib/init.py", line 126, in import_module return _bootstrap._gcd_import(name[level:], package, level) File "", line 1050, in _gcd_import File "", line 1027, in _find_and_load File "", line 1006, in _find_and_load_unlocked File "", line 688, in _load_unlocked File "", line 883, in exec_module File "", line 241, in _call_with_frames_removed File "/data/songlab/proj/cbi/torch_ecpg/torch-ecpg/tecpg/main.py", line 9, in main() File "/data/songlab/proj/cbi/torch_ecpg/torch-ecpg/tecpg/main.py", line 6, in main start() File "/data/songlab/proj/cbi/torch_ecpg/torch-ecpg/tecpg/cli.py", line 740, in start cli(obj={}) File "/home/ritu/anaconda3/envs/py310/lib/python3.10/site-packages/click/core.py", line 1130, in call return self.main(args, kwargs) File "/home/ritu/anaconda3/envs/py310/lib/python3.10/site-packages/click/core.py", line 1055, in main rv = self.invoke(ctx) File "/home/ritu/anaconda3/envs/py310/lib/python3.10/site-packages/click/core.py", line 1657, in invoke return _process_result(sub_ctx.command.invoke(sub_ctx)) File "/home/ritu/anaconda3/envs/py310/lib/python3.10/site-packages/click/core.py", line 1657, in invoke return _process_result(sub_ctx.command.invoke(sub_ctx)) File "/home/ritu/anaconda3/envs/py310/lib/python3.10/site-packages/click/core.py", line 1404, in invoke return ctx.invoke(self.callback, ctx.params) File "/home/ritu/anaconda3/envs/py310/lib/python3.10/site-packages/click/core.py", line 760, in invoke return __callback(args, kwargs) File "/home/ritu/anaconda3/envs/py310/lib/python3.10/site-packages/click/decorators.py", line 26, in new_func return f(get_current_context(), *args, *kwargs) File "/data/songlab/proj/cbi/torch_ecpg/torch-ecpg/tecpg/cli.py", line 295, in mlr output = regression_full(args, logger) File "/data/songlab/proj/cbi/torch_ecpg/torch-ecpg/tecpg/regression_full.py", line 161, in regression_full XtXi_Xt = XtXi.bmm(Xt) torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 6.71 GiB (GPU 0; 14.62 GiB total capacity; 6.74 GiB already allocated; 5.38 GiB free; 8.39 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF Command exited with non-zero status 1 93.45user 13.34system 1:37.99elapsed 108%CPU (0avgtext+0avgdata 11347428maxresident)k 0inputs+0outputs (0major+3400470minor)pagefaults 0swaps

liamgd commented 1 year ago

The memory for the constants exceeds the available CUDA memory, so there is no room for chunking. I am working on an algorithm that allows chunking for the methylation loci as well as the gene expression loci, which would allow the constants memory allocation to decrease, which sacrifices parallelization for lower memory usage.

MLR works by evaluating constants before the main loop begins. These constants are used repeatedly in the algorithm, which avoids recalculating these values every iteration. tecpg chunks estimates that the constants will take up 14.4 GB of CUDA memory, which is above the target memory usage of 12.2 GB (80% of available memory). If region filtration were enabled, the constants would require 86.4 GB. Ideally, the computer would have enough memory to at least store the constants, which would allow for chunking with a low chunk size. tecpg run mlr-single might work but is unbearably slow.

liamgd commented 1 year ago

Methylation chunking should now fix this issue. Try adding -m 225000 as an argument to the tecpg run mlr command. This will chunk the methylation loci into groups of 225000 loci which should decrease the memory allocated to constants each chunk.

kordk / torch-ecpg

mlr: torch.cuda.OutOfMemoryError error with 20k genes & 20k CpGs #30