Closed FYYFU closed 3 years ago
Hi, the results in Table 4 (and all results from Section 6 onwards) are reported using the dae_w_syn model. Using that should help you reproduce the numbers from the paper.
Hi, the results in Table 4 (and all results from Section 6 onwards) are reported using the dae_w_syn model. Using that should help you reproduce the numbers from the paper.
Thanks for you reply ! I get the result in the table 4 using the dae_w_syn
model. In the Table 3, the model trained with hallucinations( AD + S + H) outperform the AD+S in the test set except the AD test set. So could you tell me the reason to choose AD + S?
And if i want to use this tools to compare the hallucination & factuality of two summary. Should I use the AD + S + H model?
Sorry for the delayed response. The AD + S model was chosen based on our expectation of the data being evaluated. The full AD + S + H model is useful if you expect that the (input, generation) pairs will be unrelated in terms of the events and actors being discussed.
For paraphrasing/ (one sentence input, summary) tasks (like the ones in the paper), I would suggest using the AD + S model.
For (document, summary) pairs, I would suggest using the AD + S + H model. Note that the maximum input length allowed by this model is 128, so you may have to segment the document into sentences to be able to use this. Alternatively, you can use models from our follow-up work that operate at the document level: https://arxiv.org/pdf/2104.04302.pdf
Sorry for the delayed response. The AD + S model was chosen based on our expectation of the data being evaluated. The full AD + S + H model is useful if you expect that the (input, generation) pairs will be unrelated in terms of the events and actors being discussed.
For paraphrasing/ (one sentence input, summary) tasks (like the ones in the paper), I would suggest using the AD + S model.
For (document, summary) pairs, I would suggest using the AD + S + H model. Note that the maximum input length allowed by this model is 128, so you may have to segment the document into sentences to be able to use this. Alternatively, you can use models from our follow-up work that operate at the document level: https://arxiv.org/pdf/2104.04302.pdf
Thanks for your reply! i will read this paper carefully.
i download the
dae_w_syn_hallu
model and use it to run the commond you provided. The result is:That seems a little diffence from you provided in the original paper( The Table 4, DAE -- 83.9). The stanfordCoreNLP version is 3.9.1.
I also use the version 4.1.0, and get the result:
I wonder if there is somethings i should to notice.?