RuntimeError: Input, output and indices must be on the current device

priyankaupadhyay090 commented 2 years ago

@huiyegit and @shihaoji thank you for the nice work. I am using code for AttnGAN+CL. I am trying to generated samples by using Sampling and get the R-precision: python main.py --cfg cfg/eval_bird.yml --gpu 0

While running main.py. I set

WORKERS = 1 GPU_ID = 0

I got an error:

python main.py --cfg cfg/eval_bird.yml --gpu 0 Using config: {'B_VALIDATION': True, 'CONFIG_NAME': 'attn2', 'CUDA': False, 'DATASET_NAME': 'birds', 'DATA_DIR': 'data/birds', 'GAN': {'B_ATTENTION': True, 'B_DCGAN': False, 'CONDITION_DIM': 100, 'DF_DIM': 64, 'GF_DIM': 32, 'R_NUM': 2, 'Z_DIM': 100}, 'GPU_ID': 0, 'RNN_TYPE': 'LSTM', 'TEXT': {'CAPTIONS_PER_IMAGE': 10, 'EMBEDDING_DIM': 256, 'WORDS_NUM': 25}, 'TRAIN': {'BATCH_SIZE': 10, 'B_NET_D': False, 'DISCRIMINATOR_LR': 0.0002, 'ENCODER_LR': 0.0002, 'FLAG': False, 'GENERATOR_LR': 0.0002, 'MAX_EPOCH': 600, 'NET_E': 'DAMSMencoders/bird/text_encoder200.pth', 'NET_G': 'models/netG_epoch_600.pth', 'RNN_GRAD_CLIP': 0.25, 'SMOOTH': {'GAMMA1': 5.0, 'GAMMA2': 5.0, 'GAMMA3': 10.0, 'LAMBDA': 1.0}, 'SNAPSHOT_INTERVAL': 2000}, 'TREE': {'BASE_SIZE': 64, 'BRANCH_NUM': 3}, 'WORKERS': 1} seed now is : 100 Total filenames: 11788 001.Black_footed_Albatross/Black_Footed_Albatross_0046_18.jpg load images pickles Load filenames from: data/birds/train/filenames.pickle (8855) loading train images load images pickles Load filenames from: data/birds/test/filenames.pickle (2933) loading test images Load from: data/birds/captions.pickle captions file loaded for test 5450 10 generating images for the whole valid dataset self encoder /opt/conda/lib/python3.6/site-packages/torch/nn/modules/rnn.py:61: UserWarning: dropout option adds dropout after all but last recurrent layer, so non-zero dropout expects num_layers greater than 1, but got dropout=0.5 and num_layers=1 "num_layers={}".format(dropout, num_layers)) calling text encoder as RNN encoder Load text encoder from: DAMSMencoders/bird/text_encoder200.pth /opt/conda/lib/python3.6/site-packages/torchvision/models/inception.py:77: FutureWarning: The default weight initialization of inception_v3 will be changed in future releases of torchvision. If you wish to keep the old behavior (which leads to long initialization times due to scipy/scipy#11299), please set init_weights=True. ' due to scipy/scipy#11299), please set init_weights=True.', FutureWarning) Load pretrained model from https://download.pytorch.org/models/inception_v3_google-1a9a5a14.pth calling image encoder Load image encoder from: DAMSMencoders/bird/image_encoder200.pth /netscratch/pupadhyay/project/T2I_CL/AttnGAN+CL/trainer.py:465: UserWarning: volatile was removed and now has no effect. Use with torch.no_grad(): instead. noise = Variable(torch.FloatTensor(batch_size, nz), volatile=True) Load G from: models/netG_epoch_600.pth cnt: 10 word_emb and sent_emb starts calling RNN encoder forward loop embedding value Traceback (most recent call last): File "main.py", line 193, in algo.sampling(split_dir) # sampling() defined in trainer.py file File "/netscratch/pupadhyay/project/T2I_CL/AttnGAN+CL/trainer.py", line 504, in sampling words_embs, sent_emb = text_encoder(captions, cap_lens, hidden) File "/opt/conda/lib/python3.6/site-packages/torch/nn/modules/module.py", line 727, in _call_impl result = self.forward(*input, *kwargs) File "/netscratch/pupadhyay/project/T2I_CL/AttnGAN+CL/model.py", line 139, in forward emb = self.encoder(captions) File "/opt/conda/lib/python3.6/site-packages/torch/nn/modules/module.py", line 727, in _call_impl result = self.forward(input, **kwargs) File "/opt/conda/lib/python3.6/site-packages/torch/nn/modules/sparse.py", line 126, in forward self.norm_type, self.scale_grad_by_freq, self.sparse) File "/opt/conda/lib/python3.6/site-packages/torch/nn/functional.py", line 1855, in embedding return torch.embedding(weight, input, padding_idx, scale_grad_by_freq, sparse) RuntimeError: Input, output and indices must be on the current device

I used 1 GPU for container as well to avoid multi-gpu uses to solve the error but error remains same.

srun --container-image=/netscratch/enroot/dlcc_pytorch_20.10.sqsh --container-workdir=pwd -p V100-16GB,V100-32GB,A100,RTX6000,RTX3090,RTXA6000 --mem=64000M --cpus-per-task=16 --gres=gpu:1 --time=08:00:00 --pty /bin/bash

is there anyway to solve this ?

sumorday commented 2 years ago

Hi,I met same question but solved. let me show you my codes.

/home/edward/archive/T2I_CL-main/AttnGAN+CL/code/main.py --cfg /home/edward/archive/T2I_CL-main/AttnGAN+CL/code/cfg/bird_attn2.yml --gpu 0

almost same like the author, but make sure you have gpu .I guess you used colab, right? colab has gpu but I used linux 。

if you still need help, you may email me

d18091105071@cityu.mo and I can use zoom or google meeting to show you my screen and guide you finish the step.

And what calls for special attention is that all the master and phd students are pretty hard on this way. Don't worry about it! Don't give it up! And the author Hui Ye is also pretty good, he shared this wonderful paper and codes for us. Thanks to him!

sumorday commented 2 years ago

update new error for training pretrain_DAMSM.py

Traceback (most recent call last): File "/home/edward/archive/T2I_CL-main/AttnGAN+CL/code/pretrain_DAMSM.py", line 336, in mask = mask_correlated_samples_2(batch_size) File "/lustre7/home/edward/archive/T2I_CL-main/AttnGAN+CL/code/masks.py", line 4, in mask_correlated_samples_2 mask = torch.ones((args.batch_size 2, args.batch_size 2), dtype=bool) AttributeError: 'int' object has no attribute 'batch_size'

sumorday commented 2 years ago

update use this mask.py https://github.com/huiyegit/T2I_CL/blob/ff6f82e07907dd1194af48fed24c8ba24487874c/DM-GAN%2BCL/code/masks.py

priyankaupadhyay090 commented 2 years ago

update use this mask.py https://github.com/huiyegit/T2I_CL/blob/ff6f82e07907dd1194af48fed24c8ba24487874c/DM-GAN%2BCL/code/masks.py

Thank you

huiyegit / T2I_CL

RuntimeError: Input, output and indices must be on the current device #12