RuntimeError: stack expects a non-empty TensorList

phtu-cs commented 3 years ago

Hi Pradyumna,

Thank you very much for sharing your code. This will definitely inspire a lot of future works. I met an issue when runing your code. Hope to get your help when you are spare.

I ran the training command. But it logs

/home/phtu/.local/lib/python3.8/site-packages/pytorch_lightning/trainer/data_loading.py:102: UserWarning: The dataloader, train dataloader, does not have many workers which may be a bottleneck. Consider increasing the value of the num_workers argument(try 32 which is the number of cpus on this machine) in theDataLoader` init to improve performance. rank_zero_warn( Epoch 0: 100%|████████████| 13/13 [00:00<00:00, 179.52it/s, loss=nan, v_num=110]Traceback (most recent call last): File "run.py", line 103, in runner.fit(experiment) File "/home/phtu/.local/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 458, in fit self._run(model) File "/home/phtu/.local/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 756, in _run self.dispatch() File "/home/phtu/.local/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 797, in dispatch self.accelerator.start_training(self) File "/home/phtu/.local/lib/python3.8/site-packages/pytorch_lightning/accelerators/accelerator.py", line 96, in start_training self.training_type_plugin.start_training(trainer) File "/home/phtu/.local/lib/python3.8/site-packages/pytorch_lightning/plugins/training_type/training_type_plugin.py", line 144, in start_training self._results = trainer.run_stage() File "/home/phtu/.local/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 807, in run_stage return self.run_train() File "/home/phtu/.local/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 869, in run_train self.train_loop.run_training_epoch() File "/home/phtu/.local/lib/python3.8/site-packages/pytorch_lightning/trainer/training_loop.py", line 566, in run_training_epoch self.on_train_epoch_end(epoch_output) File "/home/phtu/.local/lib/python3.8/site-packages/pytorch_lightning/trainer/training_loop.py", line 606, in on_train_epoch_end training_epoch_end_output = model.training_epoch_end(processed_epoch_output) File "/home/phtu/Research/Meta/Im2Vec/experiment.py", line 115, in training_epoch_end avg_loss = torch.stack([x['loss'] for x in outputs]).mean() RuntimeError: stack expects a non-empty TensorList

It seems the length of the var "outputs" is 0. Do you know a possible reason for this issue? Thank you.

preddy5 commented 3 years ago

Hey @tph9608 I re-cloned the code from github to check if I pushed a bug into the code by mistake, but everything seems to work as expected. Could you share more information about what changes you made to the code before you started the training process?

phtu-cs commented 3 years ago

Thank you for your reply. I modify run.py line 93 and 99 from "log_save_interval" and "early_stop_callback" to "log_every_n_steps" and "callbacks", as "log_save_interval" and "early_stop_callback" are unexpected keywords on my machine. This is probably because of different versions of pytorch_lightning. I also use CUDA_VISIBLE_DEVICES=0, instead of 1.

phtu-cs commented 3 years ago

OK. Everything seems to work fine if I installed the pytorch_lightning=0.9.0. But if I install pytorch ligthtning with pip without specifying its version, it does not work.

preddy5 / Im2Vec

RuntimeError: stack expects a non-empty TensorList #6