memray / OpenNMT-kpg-release

Keyphrase Generation
MIT License
216 stars 34 forks source link

How to use Bart model in OPenNMT #49

Closed Niyx52094 closed 2 years ago

Niyx52094 commented 2 years ago

Hi, I'm trying to use the bart model in the model package. Is there any .yml file with bart to use ? Because currently I just simply change the transformer-one2one-kp20k.yml to bart but there is a debug shows after I change the generator in bart as Copygenrator (well,the original generator in bart is Sequential, but I want to use copy_attn so I changed the generator to copygenerator)


  File "/code/OpenNMT/train.py", line 10, in <module>
    main()
  File "/code/OpenNMT/onmt/bin/train.py", line 209, in main
    train(opt)
  File "/code/OpenNMT/onmt/bin/train.py", line 195, in train
    train_process(opt, device_id=0)
  File "/code/OpenNMT/onmt/train_single.py", line 114, in main
    trainer.train(
  File "/code/OpenNMT/onmt/trainer.py", line 271, in train
    res = self._gradient_accumulation(
  File "/code/OpenNMT/onmt/trainer.py", line 467, in _gradient_accumulation
    if self.model.decoder.state is not None:
  File "/anaconda3/envs/copyrnn/lib/python3.8/site-packages/torch/nn/modules/module.py", line 778, in __getattr__
    raise ModuleAttributeError("'{}' object has no attribute '{}'".format(
torch.nn.modules.module.ModuleAttributeError: 'BARTDecoder' object has no attribute 'state'```
memray commented 2 years ago

Try changing the code at L446 of code/OpenNMT/onmt/trainer.py as follows:

if hasattr(self.model.decoder, 'state') and self.model.decoder.state is not None:
   self.model.decoder.detach_state()
Niyx52094 commented 2 years ago

Very much thanks about your reply!. I'm also curious about the reason BART model using the Sequential generator defaultly instead of Copy Generator. Does this mean there is no Copy Mechanism used when I use BART Model? And may I know the reason about it? Thank you!

memray commented 2 years ago

You're right, BART has no Copy Mechanism natively and I didn't add it in training. I actually tried to add copy loss during training but it showed no help (some attention heads act similarly to the Copy). So here I just keep the BART as is.