microsoft / LayoutGeneration

MIT License
128 stars 18 forks source link

Error of Evaluation during Training PublayNet datasets #38

Open liufc2020 opened 10 months ago

liufc2020 commented 10 months ago

Hello, I met an error of training publaynet datasets when I use the following instruction:

./scripts/publaynet_refinement.sh train ../datasets output_dir basic 1 none

[1/200][1134/1216] Loss: 0.857 [1/200][1184/1216] Loss: 0.856 Evaluate on refinement 0%| | 0/257 [00:04<?, ?it/s] Traceback (most recent call last): File "main.py", line 163, in train(args) File "main.py", line 81, in train trainer(train_fn, evaluate_fn, tasks=train_dataset.tasks, File "/home/tjzn/LayoutGeneration/LayoutFormer++/src/trainer/multitask_trainer.py", line 160, in call eval_step_loss, eval_step_pred = eval_step(self.model, data, File "/home/tjzn/LayoutGeneration/LayoutFormer++/src/tasks/task_utils.py", line 277, in call prediction = self._measure_prediction(model, in_tokenization, tokenizer, File "/home/tjzn/LayoutGeneration/LayoutFormer++/src/tasks/task_utils.py", line 199, in _measure_prediction output_sequences = model(in_ids, in_padding_mask, File "/home/tjzn/miniconda3/envs/layoutformer/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl return forward_call(*input, *kwargs) File "/home/tjzn/miniconda3/envs/layoutformer/lib/python3.8/site-packages/torch/nn/parallel/data_parallel.py", line 169, in forward return self.gather(outputs, self.output_device) File "/home/tjzn/miniconda3/envs/layoutformer/lib/python3.8/site-packages/torch/nn/parallel/data_parallel.py", line 181, in gather return gather(outputs, output_device, dim=self.dim) File "/home/tjzn/miniconda3/envs/layoutformer/lib/python3.8/site-packages/torch/nn/parallel/scatter_gather.py", line 78, in gather res = gather_map(outputs) File "/home/tjzn/miniconda3/envs/layoutformer/lib/python3.8/site-packages/torch/nn/parallel/scatter_gather.py", line 69, in gather_map return type(out)((k, gather_map([d[k] for d in outputs])) File "/home/tjzn/miniconda3/envs/layoutformer/lib/python3.8/site-packages/torch/nn/parallel/scatter_gather.py", line 69, in return type(out)((k, gather_map([d[k] for d in outputs])) File "/home/tjzn/miniconda3/envs/layoutformer/lib/python3.8/site-packages/torch/nn/parallel/scatter_gather.py", line 63, in gather_map return Gather.apply(target_device, dim, outputs) File "/home/tjzn/miniconda3/envs/layoutformer/lib/python3.8/site-packages/torch/nn/parallel/_functions.py", line 75, in forward return comm.gather(inputs, ctx.dim, ctx.target_device) File "/home/tjzn/miniconda3/envs/layoutformer/lib/python3.8/site-packages/torch/nn/parallel/comm.py", line 235, in gather return torch._C._gather(tensors, dim, destination) RuntimeError: Input tensor at index 2 has invalid shape [16, 120], but expected [16, 114] wandb: - 0.008 MB of 0.008 MB uploaded

For training rico dataset, no error occurs. I download pre-processed dataset from Huggingface.