Closed lw0210 closed 1 year ago
Hi,
Thanks for your interest in our work!
root
parameter should be data_path
. I have updated the README.md
file to fix this typo.README.md
.Best, Jia
Thank you for your reply. I haven't made any changes to the code so far. I will download and run the code again to see if it works properly. @bertjiazheng
@bertjiazheng Dear,I still have this error, and I haven't modified any code. When running to validation (I set check_val_every_n_epoch to 2), an error was reported as follows. Is the dimension set for gt and pred in the model different?
internally at /opt/conda/conda-bld/pytorch_1666642975312/work/aten/src/ATen/NestedTensorImpl.cpp:175.) output = torch._nested_tensor_from_mask(output, src_key_padding_mask.logical_not(), mask_check=False) Epoch 1/999 ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1502/1502 0:11:30 • 0:00:00 2.15it/s loss: 3.73 v_num: 2 train/accuracy: 0.226 Traceback (most recent call last): File "trainer_complete.py", line 133, in cli = LightningCLI(Trainer) File "/home/liwei/anaconda3/envs/plankassembly/lib/python3.8/site-packages/pytorch_lightning/cli.py", line 350, in init self._run_subcommand(self.subcommand) File "/home/liwei/anaconda3/envs/plankassembly/lib/python3.8/site-packages/pytorch_lightning/cli.py", line 626, in _run_subcommand fn(fn_kwargs) File "/home/liwei/anaconda3/envs/plankassembly/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 696, in fit self._call_and_handle_interrupt( File "/home/liwei/anaconda3/envs/plankassembly/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 648, in _call_and_handle_interrupt return self.strategy.launcher.launch(trainer_fn, *args, trainer=self, *kwargs) File "/home/liwei/anaconda3/envs/plankassembly/lib/python3.8/site-packages/pytorch_lightning/strategies/launchers/subprocess_script.py", line 93, in launch return function(args, kwargs) File "/home/liwei/anaconda3/envs/plankassembly/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 735, in _fit_impl results = self._run(model, ckpt_path=self.ckpt_path) File "/home/liwei/anaconda3/envs/plankassembly/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 1166, in _run results = self._run_stage() File "/home/liwei/anaconda3/envs/plankassembly/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 1252, in _run_stage return self._run_train() File "/home/liwei/anaconda3/envs/plankassembly/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 1283, in _run_train self.fit_loop.run() File "/home/liwei/anaconda3/envs/plankassembly/lib/python3.8/site-packages/pytorch_lightning/loops/loop.py", line 200, in run self.advance(*args, kwargs) File "/home/liwei/anaconda3/envs/plankassembly/lib/python3.8/site-packages/pytorch_lightning/loops/fit_loop.py", line 271, in advance self._outputs = self.epoch_loop.run(self._data_fetcher) File "/home/liwei/anaconda3/envs/plankassembly/lib/python3.8/site-packages/pytorch_lightning/loops/loop.py", line 201, in run self.on_advance_end() File "/home/liwei/anaconda3/envs/plankassembly/lib/python3.8/site-packages/pytorch_lightning/loops/epoch/training_epoch_loop.py", line 241, in on_advance_end self._run_validation() File "/home/liwei/anaconda3/envs/plankassembly/lib/python3.8/site-packages/pytorch_lightning/loops/epoch/training_epoch_loop.py", line 299, in _run_validation self.val_loop.run() File "/home/liwei/anaconda3/envs/plankassembly/lib/python3.8/site-packages/pytorch_lightning/loops/loop.py", line 200, in run self.advance(*args, *kwargs) File "/home/liwei/anaconda3/envs/plankassembly/lib/python3.8/site-packages/pytorch_lightning/loops/dataloader/evaluation_loop.py", line 155, in advance dl_outputs = self.epoch_loop.run(self._data_fetcher, dl_max_batches, kwargs) File "/home/liwei/anaconda3/envs/plankassembly/lib/python3.8/site-packages/pytorch_lightning/loops/loop.py", line 200, in run self.advance(args, kwargs) File "/home/liwei/anaconda3/envs/plankassembly/lib/python3.8/site-packages/pytorch_lightning/loops/epoch/evaluation_epoch_loop.py", line 143, in advance output = self._evaluation_step(kwargs) File "/home/liwei/anaconda3/envs/plankassembly/lib/python3.8/site-packages/pytorch_lightning/loops/epoch/evaluation_epoch_loop.py", line 240, in _evaluation_step output = self.trainer._call_strategy_hook(hook_name, kwargs.values()) File "/home/liwei/anaconda3/envs/plankassembly/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 1704, in _call_strategy_hook output = fn(args, kwargs) File "/home/liwei/anaconda3/envs/plankassembly/lib/python3.8/site-packages/pytorch_lightning/strategies/ddp.py", line 358, in validation_step return self.model(*args, kwargs) File "/home/liwei/anaconda3/envs/plankassembly/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1190, in _call_impl return forward_call(*input, *kwargs) File "/home/liwei/anaconda3/envs/plankassembly/lib/python3.8/site-packages/torch/nn/parallel/distributed.py", line 1040, in forward output = self._run_ddp_forward(inputs, kwargs) File "/home/liwei/anaconda3/envs/plankassembly/lib/python3.8/site-packages/torch/nn/parallel/distributed.py", line 1000, in _run_ddp_forward return module_to_run(*inputs[0], kwargs[0]) File "/home/liwei/anaconda3/envs/plankassembly/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1190, in _call_impl return forward_call(*input, *kwargs) File "/home/liwei/anaconda3/envs/plankassembly/lib/python3.8/site-packages/pytorch_lightning/overrides/base.py", line 90, in forward return self.module.validation_step(inputs, kwargs) File "trainer_complete.py", line 74, in validation_step outputs = self.model(batch) File "/home/liwei/anaconda3/envs/plankassembly/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1190, in _call_impl return forward_call(*input, kwargs) File "/home/liwei/lw/PlankAssembly/plankassembly/models.py", line 329, in forward outputs = self.eval_step(batch) File "/home/liwei/lw/PlankAssembly/plankassembly/models.py", line 293, in eval_step hiddens = self.decoder(output_embeds, memory, tgt_mask=tgt_mask, File "/home/liwei/anaconda3/envs/plankassembly/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1190, in _call_impl return forward_call(*input, *kwargs) File "/home/liwei/anaconda3/envs/plankassembly/lib/python3.8/site-packages/torch/nn/modules/transformer.py", line 333, in forward output = mod(output, memory, tgt_mask=tgt_mask, File "/home/liwei/anaconda3/envs/plankassembly/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1190, in _call_impl return forward_call(input, kwargs) File "/home/liwei/anaconda3/envs/plankassembly/lib/python3.8/site-packages/torch/nn/modules/transformer.py", line 652, in forward x = self.norm2(x + self._mha_block(x, memory, memory_mask, memory_key_padding_mask)) File "/home/liwei/anaconda3/envs/plankassembly/lib/python3.8/site-packages/torch/nn/modules/transformer.py", line 669, in _mha_block x = self.multihead_attn(x, mem, mem, File "/home/liwei/anaconda3/envs/plankassembly/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1190, in _call_impl return forward_call(*input, **kwargs) File "/home/liwei/anaconda3/envs/plankassembly/lib/python3.8/site-packages/torch/nn/modules/activation.py", line 1167, in forward attn_output, attn_output_weights = F.multi_head_attention_forward( File "/home/liwei/anaconda3/envs/plankassembly/lib/python3.8/site-packages/torch/nn/functional.py", line 5130, in multi_head_attention_forward assert key_padding_mask.shape == (bsz, src_len), AssertionError: expecting key_padding_mask shape of (16, 1157), but got torch.Size([16, 1199])
Have you successfully installed the virtual environment?
Year,the code is trainable, guiding the 20th epoch to start reporting errors.
Ok, I will look into it and get back to you with an update.
Ok,thanks!
------------------ 原始邮件 ------------------ 发件人: "manycore-research/PlankAssembly" @.>; 发送时间: 2023年9月23日(星期六) 中午11:22 @.>; @.**@.>; 主题: Re: [manycore-research/PlankAssembly] AssertionError (Issue #2)
Ok, I will look into it and get back to you with an update.
— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you authored the thread.Message ID: @.***>
Hi @lw0210
I have successfully trained the model from scratch without any issues. I doubt that you use the wrong version of the PyTorch
package. Please double-check your virtual environment.
Given your provided error log, the shape of the memory
and input_mask
is mismatched at here. You can first check the sizes of these two tensors before calling the decoder.
Best, Jia
Thanks!I am using cuda11.7 + pytorch1.13. I will check the sizes of these two tensors before calling the decoder.
I suggest trying PyTorch 1.10.0
, as we used in our experiment. You can follow the setup process to install.
Okay, thanks! I will try reinstalling cuda11.3!
This issue has been closed due to inactivity. Please feel free to reopen it if you still have any questions.
Thank you, I'm sorry for the late reply. I have already resolved the problem after following your setup process to install.
Dear,thank you very much for your work, but I found several issues when I reproduced your code.
Firstly, I run the fourth command (Python dataset/prepare_info. py -- root path/to/data/root), and there is no root parameter in the code.
Then I run the training command (python trainer_complete.py fit --config configs/train_complete.yaml), run 19 epochs and display “AssertionError: expecting key_padding_mask shape of (16, 1157), but got torch.Size([16, 1199])”.
How can I solve this problem?