jiangjiechen / VENCE

Resources for our AAAI 2023 paper: "Converge to the Truth: Factual Error Correction via Iterative Constrained Editing".
Apache License 2.0
8 stars 0 forks source link

How to interpret the lines inside t5_train.json #4

Closed MichaelCaohn closed 12 months ago

MichaelCaohn commented 1 year ago

Hi authors,

Thank you very much for the great work. I am very interested in it. I have looked into the t5_train.json file downloaded from the google drive link shared by you.

However, I have problems in interpreting the content. For example this line of input:

{
        "input": "substituted entity : [evidence] : title : Alcatraz Island context : Alcatraz Island is located in San Francisco Bay , 1.25 mi offshore from San Francisco , California , United States . The small island was developed with facilities for a lighthouse , a military fortification , a military prison ( 1868 ) , and a federal prison from 1934 until # # # title : Alcatraz Federal Penitentiary context : The Alcatraz Federal Penitentiary or United States Penitentiary , Alcatraz Island ( often just referred to as Alcatraz ) was a maximum high - security federal prison on Alcatraz Island , 1.25 mi off the coast of San Francisco , California , which operated from 1934 to 1963 . The main prison [claim] : [MASK] is the location of Alcatraz Federal Penitentiary on Alcatraz Island .",
        "output": "San Francisco"
    },

If we treat the content after "[evidence]" as the evidence, and the content after the "[claim]" to be the claim, the result would be:

evidence:

title : Alcatraz Island context : Alcatraz Island is located in San Francisco Bay , 1.25 mi offshore from San Francisco , California , United States . The small island was developed with facilities for a lighthouse , a military fortification , a military prison ( 1868 ) , and a federal prison from 1934 until # # # title : Alcatraz Federal Penitentiary context : The Alcatraz Federal Penitentiary or United States Penitentiary , Alcatraz Island ( often just referred to as Alcatraz ) was a maximum high - security federal prison on Alcatraz Island , 1.25 mi off the coast of San Francisco , California , which operated from 1934 to 1963 . The main prison 

claim:

San Francisco is the location of Alcatraz Federal Penitentiary on Alcatraz Island .

I think the claim should be correct, however, I think my interpretation has some problem with getting the evidence. As in the above interpretation, the evidence seems to be an incomplete sentence.

Could you help me on how to extract the evidence and the claim for this example?

Thank you in advance.

Best regards, Michael

airaer1998 commented 1 year ago

Thank you for your interest in our work.

I think you are correct in your interpretation. The evidence in this task is indeed an incomplete sentence. It's a result of how the data is processed for this task. The goal here is to provide enough context for the model to make the correct inference despite the incomplete sentence.

If you have further queries related to the data, I would recommend referring to the retrieval algorithm in the paper "Evidence-based Factual Error Correction". It better explains how the evidence and claims are generated and used.

Best Regards