Open biaoyanf opened 7 months ago
Here is my code for IAA:
import pandas as pd
data_path = "./annotations/billsum_annotations.csv"
data = pd.read_csv(data_path)
print("summary level")
print(len(data[data["label_type_ann1"] == "non_factual"]["summary_uuid"].unique()), len(data[data["label_type_ann2"] == "non_factual"]["summary_uuid"].unique()))
print("-----")
print("agreement:")
print(len(data[(data["label_type_ann1"] == "non_factual") & (data["label_type_ann2"] == "non_factual")]["summary_uuid"].unique()) / ((len(data[data["label_type_ann1"] == "non_factual"]["summary_uuid"].unique())+ len(data[data["label_type_ann2"] == "non_factual"]["summary_uuid"].unique())) - (len(data[(data["label_type_ann1"] == "non_factual") & (data["label_type_ann2"] == "non_factual")]["summary_uuid"].unique()))))
print()
print("sentence level")
print(len(data[(data["label_type_ann1"] == "non_factual") & (data["label_type_ann2"] == "non_factual")]))
print(len(data[(data["label_type_ann1"] == "non_factual") | (data["label_type_ann2"] == "non_factual")]))
print(len(data[(data["label_type_ann1"] == "non_factual") & (data["label_type_ann2"] == "non_factual")])/len(data[(data["label_type_ann1"] == "non_factual") | (data["label_type_ann2"] == "non_factual")]))
Hi, @sanjanaramprasad
As above. I was trying to verify the data and calculate the IAA reported in the paper (Table 3). However, I cannot get such a high IAA agreement based on the annotation provided here:
In either case, (1) or (2), I cannot obtain the same agreement for Billsum and PubMed (e.g., 0.93 for Pubmed at sentence level).
Can you shed some light on how you calculated the IAA? Thanks.