AssertionError - Githubissues

lw0210 commented 1 year ago

Dear，thank you very much for your work, but I found several issues when I reproduced your code.

Firstly, I run the fourth command (Python dataset/prepare_info. py -- root path/to/data/root), and there is no root parameter in the code.

Then I run the training command (python trainer_complete.py fit --config configs/train_complete.yaml), run 19 epochs and display “AssertionError: expecting key_padding_mask shape of (16, 1157), but got torch.Size([16, 1199])”.

How can I solve this problem？

bertjiazheng commented 1 year ago

Hi,

Thanks for your interest in our work!

The root parameter should be data_path. I have updated the README.md file to fix this typo.
For the second issue, did you modify any code? I could not reproduce this issue by following the training instruction in README.md.

Best, Jia

lw0210 commented 1 year ago

Thank you for your reply. I haven't made any changes to the code so far. I will download and run the code again to see if it works properly. @bertjiazheng

lw0210 commented 1 year ago

@bertjiazheng Dear，I still have this error, and I haven't modified any code. When running to validation (I set check_val_every_n_epoch to 2), an error was reported as follows. Is the dimension set for gt and pred in the model different?

internally at /opt/conda/conda-bld/pytorch_1666642975312/work/aten/src/ATen/NestedTensorImpl.cpp:175.) output = torch._nested_tensor_from_mask(output, src_key_padding_mask.logical_not(), mask_check=False) Epoch 1/999 ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1502/1502 0:11:30 • 0:00:00 2.15it/s loss: 3.73 v_num: 2 train/accuracy: 0.226 Traceback (most recent call last): File "trainer_complete.py", line 133, in cli = LightningCLI(Trainer) File "/home/liwei/anaconda3/envs/plankassembly/lib/python3.8/site-packages/pytorch_lightning/cli.py", line 350, in init self._run_subcommand(self.subcommand) File "/home/liwei/anaconda3/envs/plankassembly/lib/python3.8/site-packages/pytorch_lightning/cli.py", line 626, in _run_subcommand fn(fn_kwargs) File "/home/liwei/anaconda3/envs/plankassembly/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 696, in fit self._call_and_handle_interrupt( File "/home/liwei/anaconda3/envs/plankassembly/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 648, in _call_and_handle_interrupt return self.strategy.launcher.launch(trainer_fn, *args, trainer=self, *kwargs) File "/home/liwei/anaconda3/envs/plankassembly/lib/python3.8/site-packages/pytorch_lightning/strategies/launchers/subprocess_script.py", line 93, in launch return function(args, kwargs) File "/home/liwei/anaconda3/envs/plankassembly/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 735, in _fit_impl results = self._run(model, ckpt_path=self.ckpt_path) File "/home/liwei/anaconda3/envs/plankassembly/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 1166, in _run results = self._run_stage() File "/home/liwei/anaconda3/envs/plankassembly/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 1252, in _run_stage return self._run_train() File "/home/liwei/anaconda3/envs/plankassembly/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 1283, in _run_train self.fit_loop.run() File "/home/liwei/anaconda3/envs/plankassembly/lib/python3.8/site-packages/pytorch_lightning/loops/loop.py", line 200, in run self.advance(*args, kwargs) File "/home/liwei/anaconda3/envs/plankassembly/lib/python3.8/site-packages/pytorch_lightning/loops/fit_loop.py", line 271, in advance self._outputs = self.epoch_loop.run(self._data_fetcher) File "/home/liwei/anaconda3/envs/plankassembly/lib/python3.8/site-packages/pytorch_lightning/loops/loop.py", line 201, in run self.on_advance_end() File "/home/liwei/anaconda3/envs/plankassembly/lib/python3.8/site-packages/pytorch_lightning/loops/epoch/training_epoch_loop.py", line 241, in on_advance_end self._run_validation() File "/home/liwei/anaconda3/envs/plankassembly/lib/python3.8/site-packages/pytorch_lightning/loops/epoch/training_epoch_loop.py", line 299, in _run_validation self.val_loop.run() File "/home/liwei/anaconda3/envs/plankassembly/lib/python3.8/site-packages/pytorch_lightning/loops/loop.py", line 200, in run self.advance(*args, *kwargs) File "/home/liwei/anaconda3/envs/plankassembly/lib/python3.8/site-packages/pytorch_lightning/loops/dataloader/evaluation_loop.py", line 155, in advance dl_outputs = self.epoch_loop.run(self._data_fetcher, dl_max_batches, kwargs) File "/home/liwei/anaconda3/envs/plankassembly/lib/python3.8/site-packages/pytorch_lightning/loops/loop.py", line 200, in run self.advance(args, kwargs) File "/home/liwei/anaconda3/envs/plankassembly/lib/python3.8/site-packages/pytorch_lightning/loops/epoch/evaluation_epoch_loop.py", line 143, in advance output = self._evaluation_step(kwargs) File "/home/liwei/anaconda3/envs/plankassembly/lib/python3.8/site-packages/pytorch_lightning/loops/epoch/evaluation_epoch_loop.py", line 240, in _evaluation_step output = self.trainer._call_strategy_hook(hook_name, kwargs.values()) File "/home/liwei/anaconda3/envs/plankassembly/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 1704, in _call_strategy_hook output = fn(args, kwargs) File "/home/liwei/anaconda3/envs/plankassembly/lib/python3.8/site-packages/pytorch_lightning/strategies/ddp.py", line 358, in validation_step return self.model(*args, kwargs) File "/home/liwei/anaconda3/envs/plankassembly/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1190, in _call_impl return forward_call(*input, *kwargs) File "/home/liwei/anaconda3/envs/plankassembly/lib/python3.8/site-packages/torch/nn/parallel/distributed.py", line 1040, in forward output = self._run_ddp_forward(inputs, kwargs) File "/home/liwei/anaconda3/envs/plankassembly/lib/python3.8/site-packages/torch/nn/parallel/distributed.py", line 1000, in _run_ddp_forward return module_to_run(*inputs[0], kwargs[0]) File "/home/liwei/anaconda3/envs/plankassembly/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1190, in _call_impl return forward_call(*input, *kwargs) File "/home/liwei/anaconda3/envs/plankassembly/lib/python3.8/site-packages/pytorch_lightning/overrides/base.py", line 90, in forward return self.module.validation_step(inputs, kwargs) File "trainer_complete.py", line 74, in validation_step outputs = self.model(batch) File "/home/liwei/anaconda3/envs/plankassembly/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1190, in _call_impl return forward_call(*input, kwargs) File "/home/liwei/lw/PlankAssembly/plankassembly/models.py", line 329, in forward outputs = self.eval_step(batch) File "/home/liwei/lw/PlankAssembly/plankassembly/models.py", line 293, in eval_step hiddens = self.decoder(output_embeds, memory, tgt_mask=tgt_mask, File "/home/liwei/anaconda3/envs/plankassembly/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1190, in _call_impl return forward_call(*input, *kwargs) File "/home/liwei/anaconda3/envs/plankassembly/lib/python3.8/site-packages/torch/nn/modules/transformer.py", line 333, in forward output = mod(output, memory, tgt_mask=tgt_mask, File "/home/liwei/anaconda3/envs/plankassembly/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1190, in _call_impl return forward_call(input, kwargs) File "/home/liwei/anaconda3/envs/plankassembly/lib/python3.8/site-packages/torch/nn/modules/transformer.py", line 652, in forward x = self.norm2(x + self._mha_block(x, memory, memory_mask, memory_key_padding_mask)) File "/home/liwei/anaconda3/envs/plankassembly/lib/python3.8/site-packages/torch/nn/modules/transformer.py", line 669, in _mha_block x = self.multihead_attn(x, mem, mem, File "/home/liwei/anaconda3/envs/plankassembly/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1190, in _call_impl return forward_call(*input, **kwargs) File "/home/liwei/anaconda3/envs/plankassembly/lib/python3.8/site-packages/torch/nn/modules/activation.py", line 1167, in forward attn_output, attn_output_weights = F.multi_head_attention_forward( File "/home/liwei/anaconda3/envs/plankassembly/lib/python3.8/site-packages/torch/nn/functional.py", line 5130, in multi_head_attention_forward assert key_padding_mask.shape == (bsz, src_len), AssertionError: expecting key_padding_mask shape of (16, 1157), but got torch.Size([16, 1199])

bertjiazheng commented 1 year ago

Have you successfully installed the virtual environment?

lw0210 commented 1 year ago

Year，the code is trainable, guiding the 20th epoch to start reporting errors.

bertjiazheng commented 1 year ago

Ok, I will look into it and get back to you with an update.

lw0210 commented 1 year ago

Ok，thanks！

------------------ 原始邮件 ------------------ 发件人: "manycore-research/PlankAssembly" @.>; 发送时间: 2023年9月23日(星期六) 中午11:22 @.>; @.**@.>; 主题: Re: [manycore-research/PlankAssembly] AssertionError (Issue #2)

Ok, I will look into it and get back to you with an update.

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you authored the thread.Message ID: @.***>

bertjiazheng commented 1 year ago

Hi @lw0210

I have successfully trained the model from scratch without any issues. I doubt that you use the wrong version of the PyTorch package. Please double-check your virtual environment.

Given your provided error log, the shape of the memory and input_mask is mismatched at here. You can first check the sizes of these two tensors before calling the decoder.

Best, Jia

lw0210 commented 1 year ago

Thanks！I am using cuda11.7 + pytorch1.13. I will check the sizes of these two tensors before calling the decoder.

bertjiazheng commented 1 year ago

I suggest trying PyTorch 1.10.0, as we used in our experiment. You can follow the setup process to install.

lw0210 commented 1 year ago

Okay, thanks! I will try reinstalling cuda11.3！

bertjiazheng commented 1 year ago

This issue has been closed due to inactivity. Please feel free to reopen it if you still have any questions.

lw0210 commented 1 year ago

Thank you, I'm sorry for the late reply. I have already resolved the problem after following your setup process to install.

manycore-research / PlankAssembly

AssertionError #2