isi-vista / cdse-covid

Claim detection & semantic extraction (Covid-19 domain)
0 stars 0 forks source link

Improving Claim Detection #94

Open joecummings opened 2 years ago

joecummings commented 2 years ago

Initial Brainstorm Joe & Liz (Dec 1):

  1. Utilizing strictly semantic similarity (PhraseBERT or SentenceBERT w/ interesting chunking)
  2. Frame as LM paraphrasing task - if a sentence could be "paraphrased" into one of the provided topics, then it might be a claim.
  3. Better AMR (seems like a losing battle, but worth spending a little bit of time trying to see if there's a better AMR parser out there)
  4. Graph + SS approach - Encode an AMR parse as an actual graph and use graph similarity metrics plus some token semantic similarity to determine if a chunk is similar to one of the claim templates.

Dec 2

  1. Claimbuster + AMR/FrameNet to find claim span
  2. Add custom covid claim frames to FrameNet and use existing software to match Frames in the wild

Other random (not so great) ideas:

joecummings commented 2 years ago

1. Utilizing semantic similarity

Hypothesis:

joecummings commented 2 years ago

4. Graph + SS

Hypotheses:

joecummings commented 2 years ago

5. Claimbuster + AMR/FrameNet

Hypotheses:

joecummings commented 2 years ago

6. Modeling COVID-19 Claims as Frames

Hypotheses:

elizlee commented 2 years ago

2. Paraphrasing Task

Hypothesis:

Reasoning:

Loose implementation:

  1. Break sentences into clauses
  2. Generate paraphrases for each clause using a model like the Hugging Face one here: https://huggingface.co/Vamsi/T5_Paraphrase_Paws
  3. Measure semantic similarity using cosine over SpaCy vectors
  4. Look at ~50 claims we already have and compare cosine similarity between those clauses and the correct template
  5. Average the cosine similarity scores for these pairs and have that be our threshold for determining if a clause is a claim.

Measuring:

Cons:

elizlee commented 2 years ago

3. Better AMR Parser

Hypothesis:

Reasoning:

Loose implementation:

  1. Replace Transition AMR Parser with a higher-performing parser and run it on all sentences including the claim topic templates
  2. Run AMR graph semantic similarity metric on each subgraph against each claim template
  3. Look at ~50 claims we already have and compare cosine similarity between those clauses and the correct template
  4. Average the cosine similarity scores for these pairs and have that be our threshold for determining if a clause is a claim.

Measuring:

Cons:

Side note:

elizlee commented 2 years ago

7. AMR-to-Text Semantic Similarity

Hypothesis:

Reasoning:

Loose implementation:

  1. Parse sentences into AMR
  2. For each subgraph of each sentence graph, generate a new sentence using something like https://github.com/SapienzaNLP/spring
  3. Measure semantic similarity using cosine over SpaCy vectors
  4. Look at ~50 claims we already have and compare cosine similarity between those clauses and the correct template
  5. Average the cosine similarity scores for these pairs and have that be our threshold for determining if a clause is a claim.

Measuring:

Cons: