tingofurro / summac

Codebase, data and models for the SummaC paper in TACL
https://arxiv.org/abs/2111.09525
Apache License 2.0
85 stars 24 forks source link

Range of numbers summac have #10

Closed EngSalem closed 1 year ago

EngSalem commented 1 year ago

Hi, would two exact sentences have a summac_conv score near 1.? I am having trouble interpreting the output results

tingofurro commented 1 year ago

Hey @EngSalem ,

Good question. The expected behavior is indeed that the model will return a high value, close to 1.0. However, as the sentences are being fed into a model, the behavior is likely to vary by input.

I tried the following code, which tested both SummaC-ZS and SummaC-Conv:

from summac.model_summac import SummaCZS, SummaCConv

model_zs = SummaCZS(granularity="sentence", model_name="vitc", device="cuda") # If you have a GPU: switch to: device="cuda"
model_conv = SummaCConv(models=["vitc"], bins='percentile', granularity="sentence", nli_labels="e", device="cpu", start_file="default", agg="mean")

document = "EngSalem states that the model should give a score of close to 1.0 for a pair of identical sentences."
summary1 = "EngSalem states that the model should give a score of close to 1.0 for a pair of identical sentences."
score_zs1 = model_zs.score([document], [summary1])
print("SummacZS Score: %.2f" % score_zs1["scores"][0])
score_conv1 = model_conv.score([document], [summary1])
print("SummacConv Score: %.2f" % score_conv1["scores"][0])

And got the following:

SummacZS Score: 0.99
SummacConv Score: 0.87

Both models gave very high scores, but for this example, it seems like SummaCZS gave the strongest positive score.

I hope this helps! Philippe