haruhi-sudo / ERPGN

Faithful Abstractive Summarization via Fact-aware Consistency-constrained Transformer
2 stars 0 forks source link

Can you provide your original dataset and the processing file for extracting triples from OPENIE? #2

Closed acq586419 closed 6 months ago

acq586419 commented 6 months ago

Can you provide your original dataset and the processing file for extracting triples from OPENIE? I would like to learn more. Thank you very much

haruhi-sudo commented 6 months ago

Hello. We have provided the original files of the Xsum dataset, as well as the triples extracted using OpenIE. Please refer to the "Preprocess Datasets" section in the ReadMe, which includes the relevant external links.

acq586419 commented 6 months ago

Hello. We have provided the original files of the Xsum dataset, as well as the triples extracted using OpenIE. Please refer to the "Preprocess Datasets" section in the ReadMe, which includes the relevant external links.

Thank you very much for your reply, but when generating the summary, it shows that the xsum-bin folder in the relevant link you provided is missing the test.source file. Can you provide it? Also, I would like to learn how to use OPENIE to extract Python files for triples. Would it be convenient to provide them? Thank you very much !

haruhi-sudo commented 6 months ago

We have provided the original test.source file in the google driver now.

Please ref to https://github.com/philipperemy/stanford-openie-python for the usage of OPENIE.

haruhi-sudo commented 6 months ago

And IMPORTANT, we do not provide a trained model. bart_rc.pt is just the weight of the pre-trained model, which needs to be fine-tuned by yourself.

Our model is built on fairseq. If you want to learn more, you can refer to the documentation: https://fairseq.readthedocs.io/en/latest/?badge=latest.

Building your own model with fairseq can be challenging, and you might need to read some parts of the fairseq source code. For educational purposes, it is generally advisable to use HuggingFace's Transformers library to construct your model, as it tends to be more user-friendly.

Feel free to ask me any questions.

acq586419 commented 6 months ago

@haruhi-sudo thank you very much! When I was training the model to the validation step, the following error occurred. I tried multiple modifications but still couldn't solve it. Do you have any suggestions on how to solve this problem? Thank you

-- Process 1 terminated with the following error: Traceback (most recent call last): File "/home/deep/.conda/envs/torch3.7/lib/python3.7/site-packages/torch/multiprocessing/spawn.py", line 69, in _wrap fn(i, *args) File "/home/deep/.conda/envs/torch3.7/lib/python3.7/site-packages/fairseq/distributed/utils.py", line 328, in distributed_main main(cfg, kwargs) File "/home/deep/PRGEN/train.py", line 181, in main valid_losses, should_stop = train(cfg, trainer, task, epoch_itr) File "/home/deep/.conda/envs/torch3.7/lib/python3.7/contextlib.py", line 74, in inner return func(*args, *kwds) File "/home/deep/PRGEN/train.py", line 314, in train cfg, trainer, task, epoch_itr, valid_subsets, end_of_epoch, File "/home/deep/PRGEN/train.py", line 404, in validate_and_save valid_losses = validate(cfg, trainer, task, epoch_itr, valid_subsets) File "/home/deep/PRGEN/train.py", line 476, in validate trainer.valid_step(sample) File "/home/deep/.conda/envs/torch3.7/lib/python3.7/contextlib.py", line 74, in inner return func(args, kwds) File "/home/deep/PRGEN/trainer.py", line 1037, in valid_step sample, self.model, self.criterion, *extra_kwargs File "/home/deep/PRGEN/src/task/faithful_summary_task.py", line 282, in valid_step metrics = self._inference_with_bleu(self.sequence_generator, sample, model) File "/home/deep/PRGEN/src/task/faithful_summary_task.py", line 332, in _inference_with_bleu gen_out = self.inference_step(generator, [model], sample, prefix_tokens=None) File "/home/deep/.conda/envs/torch3.7/lib/python3.7/site-packages/fairseq/tasks/fairseq_task.py", line 541, in inference_step models, sample, prefix_tokens=prefix_tokens, constraints=constraints File "/home/deep/.conda/envs/torch3.7/lib/python3.7/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context return func(args, kwargs) File "/home/deep/.conda/envs/torch3.7/lib/python3.7/site-packages/fairseq/sequence_generator.py", line 204, in generate return self._generate(sample, kwargs) File "/home/deep/.conda/envs/torch3.7/lib/python3.7/site-packages/fairseq/sequence_generator.py", line 470, in _generate assert step < max_len, f"{step} < {max_len}" AssertionError: 60 < 60

haruhi-sudo commented 6 months ago

This is strange because I don't have this problem. My guess is that the fairseq versions are different.

You can try changing this line to default='{"beam":6, "lenpen":1.0, "max_len_b":60, "min_len":10, "no_repeat_ngram_size":3, "match_source_len":1}', See if that solves the problem.