Bugs preventing reproducing results

felix-schneider commented 2 years ago

The code is broken in a number of places, making it impossible to reproduce results:

There is no way to specify args.mode in run.py other than editing it into the script
- passing --mode train to run.py does not work, because that argument will be stored in args.train.mode, not args.mode.
- The same is true for args.checkpoint_dir for inference.
In gen_summary/inference.py:38, you assign to self.bart.cfg.dataset.batch_size_valid = bsz. I'm not sure what this is supposed to do, but the bart object comes from fairseq and has no cfg member.
in stage 2, run.py does not actually use the output from stage 1 and will just create the same input/output as stage 1 again
- the others are one-line fixes but for this one I am not sure how to fix

For others who are also trying to reproduce:

The required fairseq version is actually 0.10.0, there is no 1.10.0
- You also need to copy the examples folder somewhere and add its parent folder to PYTHONPATH. You must move it out of the main fairseq dir, or the fairseq logging module will override the Python builtin.
Since AnyROUGE does not provide any installation instructions: Clone the library and add its root folder to the PYTHONPATH.
Then clone https://github.com/pltrdy/rouge into the ThirdParty folder inside AnyRouge. If you put it somewhere else or pip install it, you will need to edit the scripts to change imports.

Jinhyeong-Lim commented 2 years ago

in stage 2, run.py does not actually use the output from stage 1 and will just create the same input/output as stage 1 again

the others are one-line fixes but for this one I am not sure how to fix

hi I reproduced almost everything and modified the code to get similar performance.

I think you should modify the data preprocessing part code. And the hyperparameter should be modified according to the details written in the paper.

WeixiangYAN commented 2 years ago

Hello, can you share your modified code? thanks.

xianglous commented 2 years ago

in stage 2, run.py does not actually use the output from stage 1 and will just create the same input/output as stage 1 again

I think the at line 100 of run.py, it is getting the output data from stage 1 with args.cur_stage still being 1 at that time:

source_path = os.path.join(args.train.output_path, f"stage_{args.cur_stage}")
cur_source = load_split_aslist(source_path, suffix='hypo')

But the other mentioned problems do exist and also at line 87 and 123 or run.py, the path to the checkpoint should be something like:

os.path.join(training_args.checkpoint_dir, f"stage_{args.cur_stage}/trainer_output")

if we want to use the checkpoints after training without any file/folder movements.

CAH9487 commented 2 years ago

I only achieved around 70% paper rouge score on AMI dataset. I used fairseq 0.12.2 or 0.10.0 + comment out codes about self.bart.cfg.xxxxx in inference.py Is there something except @felix-schneider & @xianglous said should be modified?

heatherzheng commented 1 year ago

Hi @ CAH9487, I would like to know if you produced the result. do we need to change anything about the configure file? Thanks!

psunlpgroup / Summ-N

Bugs preventing reproducing results #5