lkwq007 / CogView2-low-vram

CogView2 for GPUs with 12/16/24GB vRAM
Apache License 2.0
16 stars 1 forks source link

Still have OOM error #1

Open KyriaAnnwyn opened 2 years ago

KyriaAnnwyn commented 2 years ago

I have 3080 with 16G VRAM on my laptop. I switched to 12g branch and try to run on my laptop;

SAT_HOME=../CogView2/sharefs/cogview-new python3 cogview2_completion.py --mode inference --fp16 --output-path samples_sat_v0.2_comp --batch-size 16 --max-inference-batch-size 2 --input-source ../CogView2/input_compMy.txt --single-gpu

I get this error: ...SwissArmyTransformer/model/transformer.py", line 452, in forward hidden_states = hidden_states + position_embeddings RuntimeError: CUDA out of memory. Tried to allocate 96.00 MiB (GPU 0; 15.75 GiB total capacity; 14.73 GiB already allocated; 49.81 MiB free; 14.75 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

Why could this happen?

KyriaAnnwyn commented 2 years ago

On another device V100 with 32G VRAM I get

Working on No. 0 on 0... Killed

When loading pretrained models

KyriaAnnwyn commented 2 years ago

And when I try to use multi-gpu (2xV100) I get indexing error:

../aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [96,0,0], thread: [95,0,0] Assertion srcIndex < srcSelectDimSize failed. Traceback (most recent call last): File "cogview2_text2image.py", line 249, in main(args) File "cogview2_text2image.py", line 172, in main generate_continually(process, args.input_source) File "/home/admin/CogView2/cvenv/lib/python3.8/site-packages/SwissArmyTransformer/generation/utils.py", line 74, in generate_continually func(raw_text) File "cogview2_text2image.py", line 149, in process decoded_img = tokenizer.decode(image_ids=seq[-3600:]) File "/home/admin/CogView2/cvenv/lib/python3.8/site-packages/icetk/ice_tokenizer.py", line 97, in decode return self.image_tokenizer.decode(image_ids, l=int(math.log2(compress_rate))-2) File "/home/admin/CogView2/cvenv/lib/python3.8/site-packages/icetk/image_tokenizer.py", line 75, in decode out = self.model.single_decode_code(codes, l) File "/home/admin/CogView2/cvenv/lib/python3.8/site-packages/icetk/vqvae/vqvae_hierarchical.py", line 92, in single_decode_code quant = self.quantize.embed_code(code).permute(0, 3, 1, 2) File "/home/admin/CogView2/cvenv/lib/python3.8/site-packages/icetk/vqvae/quantize.py", line 106, in embed_code return F.embedding(embed_id, self.embed.weight) File "/home/admin/CogView2/cvenv/lib/python3.8/site-packages/torch/nn/functional.py", line 2199, in embedding return torch.embedding(weight, input, padding_idx, scale_grad_by_freq, sparse) RuntimeError: CUDA error: device-side assert triggered

lkwq007 commented 2 years ago

@KyriaAnnwyn

I have 3080 with 16G VRAM on my laptop. I switched to 12g branch and try to run on my laptop;

SAT_HOME=../CogView2/sharefs/cogview-new python3 cogview2_completion.py --mode inference --fp16 --output-path samples_sat_v0.2_comp --batch-size 16 --max-inference-batch-size 2 --input-source ../CogView2/input_compMy.txt --single-gpu

The hack for cogview2_completion.py in the 12g branch is incomplete. If you want to try cogview2_completion, please switch to the main branch. By the way, I am not sure whether cogview2_completion can work with a GPU with less than 20G vRAM.

On another device V100 with 32G VRAM I get

Working on No. 0 on 0... Killed

Seems to be killed due to OOM. The peak RAM usage of cogview2 is about 50G.

And when I try to use multi-gpu (2xV100) I get indexing error:

The error is triggered by icetk. It seems that this runtime error might not be related to the hack.

KyriaAnnwyn commented 2 years ago

@lkwq007 Thank you for you answer!

The OOM error for 16G card is also reproduced for text2image: SAT_HOME=../CogView2/sharefs/cogview-new python3 cogview2_text2image.py --mode inference --fp16 --input-source ../CogView2/input.txt --output-path samples_sat_v0.2 --batch-size 16 --max-inference-batch-size 2 --single-gpu

The error: RuntimeError: CUDA out of memory. Tried to allocate 254.00 MiB (GPU 0; 15.75 GiB total capacity; 14.74 GiB already allocated; 31.81 MiB free; 14.76 GiB reserved in total by PyTorch)

Seems to be killed due to OOM. The peak RAM usage of cogview2 is about 50G. That machine has 96G RAM, seems like it shouldn't run out of mem

KyriaAnnwyn commented 2 years ago

@lkwq007 Thank you, one problem is solved!

Seems to be killed due to OOM. The peak RAM usage of cogview2 is about 50G.

Checked my memory usage and found out that mem was half used with shared mem, so it was really not enough. When I feed it, the script worked!

lkwq007 commented 2 years ago

@KyriaAnnwyn

The OOM error for 16G card is also reproduced for text2image: SAT_HOME=../CogView2/sharefs/cogview-new python3 cogview2_text2image.py --mode inference --fp16 --input-source ../CogView2/input.txt --output-path samples_sat_v0.2 --batch-size 16 --max-inference-batch-size 2 --single-gpu

The error: RuntimeError: CUDA out of memory. Tried to allocate 254.00 MiB (GPU 0; 15.75 GiB total capacity; 14.74 GiB already allocated; 31.81 MiB free; 14.76 GiB reserved in total by PyTorch)

Perhaps --max-inference-batch-size 1 will work. If not, you may need to modify this line of code https://github.com/lkwq007/CogView2-low-vram/blob/38cd2cf4e76782ac583afdac4de1c4a9da1e00aa/cogview2_text2image.py#L79

to

    text_model_split = 2 if total_memory<=20 else 1
KyriaAnnwyn commented 2 years ago

@lkwq007 Thank you for support

Perhaps --max-inference-batch-size 1 will work.

  • I tried this - didn't help. I will try you modification

BTW cogview2_completion.py - also works on V100 with --single-gpu, without this flag it results in black images