Improving Claim Detection

isi-vista / cdse-covid

Claim detection & semantic extraction (Covid-19 domain)

0 stars 0 forks source link

Improving Claim Detection #94

Open joecummings opened 2 years ago

joecummings commented 2 years ago

Initial Brainstorm Joe & Liz (Dec 1):

Utilizing strictly semantic similarity (PhraseBERT or SentenceBERT w/ interesting chunking)
Frame as LM paraphrasing task - if a sentence could be "paraphrased" into one of the provided topics, then it might be a claim.
Better AMR (seems like a losing battle, but worth spending a little bit of time trying to see if there's a better AMR parser out there)
Graph + SS approach - Encode an AMR parse as an actual graph and use graph similarity metrics plus some token semantic similarity to determine if a chunk is similar to one of the claim templates.

Dec 2

Claimbuster + AMR/FrameNet to find claim span
Add custom covid claim frames to FrameNet and use existing software to match Frames in the wild

Other random (not so great) ideas:

If a sentence has a VBD and numbers, consider it a claim. Would certainly have a higher recall.

joecummings commented 2 years ago

1. Utilizing semantic similarity

Hypothesis:

Taking the cosine similarity between an encoded sentence and each encoded template should tell us which (if any) of the templates it is most similar to.
Reasoning:
This has worked fairly well for many other of our purposes. Templates and claims should be much more semantically similar than other text spans.
Loose implementation:
1. Break sentences into clauses.
2. Encode clauses with either SentenceBERT or PhraseBERT
3. Encode provided claim templates with whichever model was chosen for 2
4. Do a pairwise cosine similarity measurement between all clauses and all templates using SBERT's utils
5. Using the ~50 positive claims we already have, see what the cosine similarity is between those clauses and the correct template
6. Average the cosine similarity scores for these pairs and have that be our threshold for determining if a clause is a claim.
  Measuring
How many of the ~50 positive claims can this method find?
How many extra claims does this method find?
How many bad examples does this method find?
How long does it take to do this entire process?

joecummings commented 2 years ago

4. Graph + SS

Hypotheses:

AMR graphs closely enough resemble standard graphs that we can use graph similarity metrics.
Graph similiarity will sometimes not be enough, and we can rely on encoding noun phrases to determine similiary, as well.
Reasoning:
SMatch and other AMR comparison metrics don't seem well suited for determining similarity. Although variations in sentence structure can produce slightly different AMR graphs, the differences will be smaller than those of a completed different AMR parse. Therefore, standard graph metrics might be a good way to capture these small changes.
Loose implementation:
1. Parse every sentence into an AMR graph.
2. Convert AMR graphs into NetworkX graphs.
3. Parse every template into an AMR graph.
4. Convert template graphs into NetworkX graphs.
5. Calculate a similarity score between every AMR graph and template graph.
  - See lestat-alignment for some choices of similarity scores
6. Do all steps in Utilizing semantic similarity for getting a SS score.
7. Combine graph sim scores and semantic sim scores.
  Measuring
How many of the ~50 positive claims can this method find?
How many extra claims does this method find?
How many bad examples does this method find?
How long does it take to do this entire process?
Is the graph method or the semantic sim method more important?

joecummings commented 2 years ago

5. Claimbuster + AMR/FrameNet

Hypotheses:

Claimbuster is good enough to detect claim sentences, but not claim spans.
The majority of claim clauses follow the patterns we identified through FrameNet.
Reasoning:
In a spot check of UIUC's output (using Claimbuster), it was fairly good at detecting sentences which contained claims, but it was pretty bad at detecting what part of that sentence contained a claim. In our experiments for determining the "claimer", we actually build a tool that can detect which part of an AMR graph represents a claim.
Loose implementation:
1. Run Claimbuster over every sentence in the corpora.
2. Parse every sentence that Claimbuster returns as a probable claim into an AMR graph.
3. Extract the subgraph in the AMR graph for the verbs that we already identified as likely to be referring to a claim.
4. Reconstruct the claim span from the AMR subgraph.
  Measuring:
How many of the ~50 positive claims can this method find?
How many extra claims does this method find?
How many bad examples does this method find?
How long does it take to do this entire process?
What is the split between finding good sentences vs finding good clauses?

joecummings commented 2 years ago

6. Modeling COVID-19 Claims as Frames

Hypotheses:

Existing technology to find Frames is good.
We can accurately and with not too much effort create viable "Frames" for COVID-related claims.
Reasoning:
Basing this proposal on the paper: https://ranger.uta.edu/~cli/pubs/2019/modelingclaims-cj19-arslan.pdf, in which the authors added custom fact-claim Frames to FrameNet and then utilized existing software in order to find those Frames in new documents.
Loose implementation:
1. Manually examine found claims in each topic/subtopic for syntactic similarities (like the templates).
2. Define Frames according to FrameNet.
3. Utilize Open-Sesame to match Frames over new documents.
  Measuring:
How many of the ~50 positive claims can this method find?
How many extra claims does this method find?
How many bad examples does this method find?
How long does it take to do this entire process?
Cons:
Definitely the most manual of all the proposals. Doesn't scale well with new domains.

elizlee commented 2 years ago

2. Paraphrasing Task

Hypothesis:

We can create paraphrases from claim sentences that are more easily comparable to claim topics.

Reasoning:

Given that semantic similarity has already worked fairly well, comparing paraphrased versions of each clause would in theory yield more accurate similarity scores.

Loose implementation:

Break sentences into clauses
Generate paraphrases for each clause using a model like the Hugging Face one here: https://huggingface.co/Vamsi/T5_Paraphrase_Paws
Measure semantic similarity using cosine over SpaCy vectors
Look at ~50 claims we already have and compare cosine similarity between those clauses and the correct template
Average the cosine similarity scores for these pairs and have that be our threshold for determining if a clause is a claim.

Measuring:

How many of the ~50 positive claims can this method find?
How many extra claims does this method find?
How many bad examples does this method find?
How long does it take to do this entire process?

Cons:

From early testing of the paraphrase generator mentioned, it doesn't actually appear to do much paraphrasing, and if this trend persists, it might as well be plan 1.

elizlee commented 2 years ago

3. Better AMR Parser

Hypothesis:

There exists a better-performing AMR parser than IBM's Transition AMR Parser that produces output which better matches the claim subgraphs with claim topic template graphs.

Reasoning:

A system that yields more accurate AMR graphs may yield more accurate semantic similarity scores between the claim graph and a template graph.

Loose implementation:

Replace Transition AMR Parser with a higher-performing parser and run it on all sentences including the claim topic templates
Run AMR graph semantic similarity metric on each subgraph against each claim template
Look at ~50 claims we already have and compare cosine similarity between those clauses and the correct template
Average the cosine similarity scores for these pairs and have that be our threshold for determining if a clause is a claim.

Measuring:

How many of the ~50 positive claims can this method find?
How many extra claims does this method find?
How many bad examples does this method find?
How long does it take to do this entire process?
How does this compare with the results from the Transition AMR Parser?

Cons:

IIRC, we selected the Transition AMR Parser because it was one of the leading ones out there. Although there are others with slightly better Smatch, I doubt that replacing our current parser with those would provide much of an improvement boost, and time would be better spent trying an alternate method.

Side note:

https://github.com/SapienzaNLP/spring at least appears to have a simpler setup process (although it would still need its own environment). If time, perhaps it's worth trying out just to see if we can make installation less flaky.

elizlee commented 2 years ago

7. AMR-to-Text Semantic Similarity

Hypothesis:

An AMR-to-Text generator will accurately create more basic sentences from their AMR graphs
Those simpler sentences will yield more accurate semantic similarity between those sentences and claim topics.

Reasoning:

The idea is similar to plan 2, but the way we get our "paraphrase" comes directly from AMR graphs.

Loose implementation:

Parse sentences into AMR
For each subgraph of each sentence graph, generate a new sentence using something like https://github.com/SapienzaNLP/spring
Measure semantic similarity using cosine over SpaCy vectors
Look at ~50 claims we already have and compare cosine similarity between those clauses and the correct template
Average the cosine similarity scores for these pairs and have that be our threshold for determining if a clause is a claim.

Measuring:

How many of the ~50 positive claims can this method find?
How many extra claims does this method find?
How many bad examples does this method find?
How long does it take to do this entire process?

Cons:

Compared to 2 and 3, this involves the most steps to get something comparable to the claim topics.