Open katesanders9 opened 1 year ago
FULL TRANSCRIPT:
(House:)before Mistress Ilsa's rude interruption.
- That was probably a good thing, given...
- Start him on antibiotics and blood thinners.
(Cameron:)You still think Chase is wrong?
(House:)No, he's probably right.
(Chase:)Then we should schedule him
for vascular surgery.
(Chase:)Go into the carotids,
find the aneurysm, repair it.
(Cameron:)We put him on blood thinners,
he might bleed out.
(House:)But if Foreman's right about
it being bacterial endocarditis and we...
(Foreman:)I think Chase is right.
(House:)Okay. If Foreman used to be right
about it being blood clots...
(House:)and we take the surgery route,
we'll probably kill the guy.
(House:)So start him on blood thinners, and if he has
another stroke, we'll schedule the surgery.
(Foreman:)- Yeah?
- You Dr. House?
I have a feeling I'm gonna
regret this, but yes.
(Myron:)Ramona said I should come to you
for my refill.
Uh, you know, the...
The wood's beginning to droop.
(Myron:)She said you'd probably
want to talk to me first.
(House:)She lied.
(House:)Here you go.
Thanks.
(House:)Always happy to make people happy.
Good-bye.
Uh, I don't know how much more
of this I can take.
(House:)- Much more of what?
HYPOTHESIS:
Foreman rescinds his diagnosis.
EVIDENCE:
(Chase:)Then we should schedule him
for vascular surgery.
(Chase:)Go into the carotids,
find the aneurysm, repair it.
(Cameron:)We put him on blood thinners,
he might bleed out.
(House:)But if Foreman's right about
it being bacterial endocarditis and we...
(Foreman:)I think Chase is right.
(House:)Okay. If Foreman used to be right
about it being blood clots...
(House:)and we take the surgery route,
we'll probably kill the guy.
Retrieval via CrossEncoder trained on MSMarco H1 generation via ChatGPT
No retrieved dialogue lines.
Via CrossEncoder trained on SNLI and MultiNLI
Skip.
Via ChatGPT Used slightly different prompting this time.
H1: Foreman initially diagnosed the patient's condition as bacterial endocarditis.
H2: Foreman rescinds his diagnosis and agrees with Chase that they should schedule the patient for vascular surgery.
Retrieval via CrossEncoder trained on MSMarco H1 generation via ChatGPT
H0: Foreman initially diagnosed the patient's condition as bacterial endocarditis.
Retrieved dialogue:
"(House:)But if Foreman's right about\nit being bacterial endocarditis and we..."
SCORE: 3.59
H1: Foreman suggested that the patient might have bacterial endocarditis.
H0: Foreman rescinds his diagnosis and agrees with Chase that they should schedule the patient for vascular surgery.
Retrieved dialogue:
(Foreman:)I think Chase is right.
SCORE: 3.60
H1: Foreman agrees with Chase.
Via CrossEncoder trained on SNLI and MultiNLI
H0: Foreman initially diagnosed the patient's condition as bacterial endocarditis.
H1: Foreman suggested that the patient might have bacterial endocarditis.
LABEL: ENTAILMENT <-- But only backwards entailment.
H0: Foreman rescinds his diagnosis and agrees with Chase that they should schedule the patient for vascular surgery.
H1: Foreman agrees with Chase.
LABEL: ENTAILMENT <-- But only backwards entailment.
H0: Foreman rescinds his diagnosis.
> H1: Foreman initially diagnosed the patient's condition as bacterial endocarditis.
> D1: (House:)But if Foreman's right about\nit being bacterial endocarditis and we...
> H2: Foreman rescinds his diagnosis and agrees with Chase that they should schedule the patient for vascular surgery.
> D2: (Foreman:)I think Chase is right.
Generally, retrieval and entailment checks are going pretty well at this point. The cross encoders identify relevant dialogue and do NLI classifications on *parsed* dialogue-hypothesis pairs successfully.
The primary area that currently needs work is the hypothesis branching. The two proofs needed different branching prompts and the prompts produced for the second proof were too specific.
Write five statements that are true given the provided dialogue. Write your answer in JSON format, i.e. {'1': '<answer here>', '2': '<answer here>', ...} and nothing else.
DIALOGUE: ...
STATEMENTS:
3. Pass (H0, H1) pairs into a cross encoder trained on SNLI and MultiNLI
4. Return (D1, H1) pairs that receive entailment judgments from the cross encoder as viable leaves
## Examples
**Note: Based on this example, it would be good to include an automatic "X said Y" H1 candidate in addition to the GPT-generated ones.**
HYPOTHESIS: `"Mike said Can we not talk about that now?"`
RETRIEVED DIALOGUE:
MIKE: Can we not talk about that now?
H1 CANDIDATES:
1: 'Mike expressed a desire to avoid discussing a certain topic at the present moment.', 2: 'There is a specific topic that Mike wants to avoid discussing.', 3: 'The conversation is being redirected away from a particular subject.', 4: 'Mike is not ready or willing to talk about something.', 5: 'The current timing is not suitable for discussing the mentioned topic.'
FILTERED CANDIDATES:
'Mike expressed a desire to avoid discussing a certain topic at the present moment.'
PROOF COMPLETE
***
**Note: Based on the importance of knowing who is in conversations, I'm thinking of using `l=2` dialogue lines per chunk.**
**Note 2: Switch to http://www.livesinabox.com/friends/season7/718towja.htm for Friends transcripts**
HYPOTHESIS: `"That guy at the counter is checking out Monica."`
RETRIEVED DIALOGUE:
PHOEBE: That guy at the counter is totally checking you out! MONICA: Really?
H1 CANDIDATES:
'1': 'Phoebe commented on someone at the counter checking Monica out.', '2': 'Monica was surprised by Phoebe's observation.', '3': 'There is a guy standing at the counter.', '4': 'Monica was unsure about the truth of Phoebe's statement.', '5': 'Phoebe and Monica are having a conversation.'
FILTERED CANDIDATES:
'Phoebe commented on someone at the counter checking Monica out.'
PROOF COMPLETE
Works for D=1
Write five hypotheses that a human would infer from the dialogue. Write your answer in JSON format, i.e. {'1': '<hypothesis 1>', '2': '<hypothesis 2>', ...} and nothing else.
DIALOGUE:
(Mrs. Rice:) Claire...
(Meredith:) You've had surgery recently.
...
(Claire:) Don't tell my parents.
(Meredith:) You did this in Mexico so your parents wouldn't know?
ChatGPT
{
'1': 'Claire had surgery recently.',
'2': 'Mrs. Rice knows about Claire's recent surgery.',
'3': 'Claire had surgery in Mexico.',
'4': 'Claire's parents are not aware of her surgery.',
'5': 'Claire wanted to keep her surgery in Mexico a secret from her parents.'
}
Write five hypotheses that a human would infer from the dialogue. Write your answer in JSON format, i.e. {'1': '<hypothesis 1>', '2': '<hypothesis 2>', ...} and nothing else.
DIALOGUE:
(House:) But if Foreman's right about\nit being bacterial endocarditis and we...
...
(Foreman:) I think Chase is right.
ChatGPT
{
"1": "Foreman and House are discussing a medical case.",
"2": "Chase has a differing opinion from Foreman and House.",
"3": "The medical condition they are discussing is suspected to be bacterial endocarditis.",
"4": "Foreman's opinion aligns with Chase's on the medical case.",
"5": "House is expressing uncertainty about Foreman's diagnosis."
}
Write five hypotheses that a human would infer from the dialogue. Write your answer in JSON format, i.e. {'1': '<hypothesis 1>', '2': '<hypothesis 2>', ...} and nothing else.
DIALOGUE:
(House:) But if Foreman's right about\nit being bacterial endocarditis and we...
...
(Foreman:) I think Chase is right.
...
(House:) Okay. If Foreman used to be right about it being blood clots...
ChatGPT
{
'1': 'Foreman used to disagree with Chase about the diagnosis being bacterial endocarditis.',
'2': 'Foreman now agrees with Chase about the diagnosis.',
'3': 'Foreman and Chase had a previous disagreement about the nature of the illness.',
'4': 'House is considering Foreman's previous accuracy in diagnosing blood clots.',
'5': 'The group is discussing different potential diagnoses for an unknown medical condition.'
}
Works for many hypotheses, doesn't for some.
Break down the following sentence into two, single-clause sentences: "Claire had surgery in Mexico."
1. Claire had surgery.
2. The surgery took place in Mexico.
Break down the following sentence into two, single-clause sentences: "The headmaster suggests Castle and Beckett talk to Donny's friends when they're inquiring about Donny."
1. The headmaster suggests Castle and Beckett talk to Donny's friends.
2. Castle and Beckett are inquiring about Donny.
Failure mode:
Break down the following sentence into two, single-clause sentences: "Foreman rescinds his diagnosis."
1. Foreman rescinds.
2. His diagnosis.
Asking beforehand if the sentence can indeed be broken down does return correct judgments from ChatGPT for these sentences.
FULL TRANSCRIPT:
(Castle:)Only, you moved recently.
(Mrs Kendall:)Yes, a few months ago.
I was a partner at Lehman. And when it
went under, well, we took a big hit.
(Castle:)That can be pretty tough on a teenager.
(Mrs Kendall:)You get used to a certain way of living,
but it was no one's fault.
(Castle:)Do you think Donny was depressed?
No, Donny was strong.
But to lose everything you have...
(Mr Kendall:)We were just getting by.
(Beckett:)I'm very sorry for your loss.
(Beckett:)How did you know they moved?
(Castle:)Either the Kendalls really like art,
or that Chinese screen
(Castle:)was meant
for a much bigger apartment.
(Castle:)I wonder how they could
still afford Redding.
Do you know the school?
(Castle:)Oh, I've been kicked
out of all of New York's
(Castle:)finer educational institutions
at least once.
(Castle:)The irony is,
now that I'm rich and famous,
they all claim me as alum
and want money.
(Beckett:)It is just so rough being you.
(Castle:)My cross to bear.
(Headmaster:)When the Kendalls
said they couldn't afford tuition,
(Headmaster:)we put Donny on scholarship.
(Headmaster:)The family had been very generous
in the past,
(Headmaster:)and Donny was one of our brightest.
We thought he'd do great things.
(Castle:)Any idea what he would have been
doing at Central Park at night?
(Headmaster:)The truth is, all these kids are icebergs.
We only see the tip.
(Headmaster:)If you want to know the rest,
you should probably talk to his friends.
(Headmaster:)Amanda, Romy, Brandon, Spencer
and Max.
(Headmaster:)It's strange seeing them without Donny.
(Beckett:)Thank you.
Thanks.
HYPOTHESIS:
The headmaster suggests Castle and Beckett talk to Donny's friends when they're inquiring about Donny.
EVIDENCE:
(Headmaster:)and Donny was one of our brightest.
We thought he'd do great things.
(Castle:)Any idea what he would have been
doing at Central Park at night?
(Headmaster:)The truth is, all these kids are icebergs.
We only see the tip.
(Headmaster:)If you want to know the rest,
you should probably talk to his friends.
(Headmaster:)Amanda, Romy, Brandon, Spencer
and Max.
Retrieval via CrossEncoder trained on MSMarco H1 generation via ChatGPT Sampled top 4 scores, 7 were >0
(Headmaster:)The family had been very generous in the past,
(Headmaster:)and Donny was one of our brightest. We thought he'd do great things.
'1': 'The family has been generous to the school in the past.',
'2': 'Donny was considered one of the brightest students at the school.',
'3': 'The school had high expectations for Donny's future success.',
'4': 'Donny was expected to achieve great things.',
'5': 'The headmaster expressed positive sentiments about Donny.'
(Headmaster:) and Donny was one of our brightest. We thought he'd do great things.
(Castle:) Any idea what he would have been doing at Central Park at night?"
'1': 'Donny was considered as one of the brightest students by the Headmaster.',
'2': 'The Headmaster had high expectations for Donny, believing he would achieve great things.',
'3': 'Donny is no longer around or has disappeared as indicated by the Castle's question.',
'4': 'Central Park is a location where Donny was found to be at night.',
'5': 'There is uncertainty about what Donny would have been doing at Central Park at night.'
(Headmaster:) Amanda, Romy, Brandon, Spencer and Max.
(Headmaster:) It's strange seeing them without Donny.
'1': 'The Headmaster knows at least five people: Amanda, Romy, Brandon, Spencer, and Max.',
'2': 'The person named Donny is not present in the group.',
'3': 'The group consists of five individuals.',
'4': 'Amanda, Romy, Brandon, Spencer, and Max are present at the moment.',
'5': 'The Headmaster finds it unusual that Donny is not with the group.'
(Headmaster:) It's strange seeing them without Donny.
(Beckett:) Thank you.
'1': 'The Headmaster mentioned someone named Donny.',
'2': 'Beckett expressed gratitude in response to the Headmaster's statement.',
'3': 'Donny is absent from the current situation.',
'4': 'The Headmaster and Beckett are having a conversation.',
'5': 'There is a sense of unfamiliarity in the situation due to Donny's absence.'
Via CrossEncoder trained on SNLI and MultiNLI
Using threshold of -0.1 for E classification.
H0: The headmaster suggests Castle and Beckett talk to Donny's friends when they're inquiring about Donny.
FILTERED DIALOGUE: NONE
Via ChatGPT
1. The headmaster suggests Castle and Beckett talk to Donny's friends.
2. Castle and Beckett are inquiring about Donny.
Retrieval via CrossEncoder trained on MSMarco H1 generation via ChatGPT
H0: The headmaster suggests Castle and Beckett talk to Donny's friends.
Retrieved dialogue:
(Headmaster:) Amanda, Romy, Brandon, Spencer and Max.
(Headmaster:) It's strange seeing them without Donny.
Same H1s.
(Headmaster:) It's strange seeing them without Donny.
(Beckett:) Thank you.
Same H1s.
H0: Castle and Beckett are inquiring about Donny.
Retrieved dialogue:
(Mrs Kendall:)You get used to a certain way of living, but it was no one's fault.
(Castle:)Do you think Donny was depressed?
'1': 'Mrs. Kendall lived a certain way of living, and she believes it was not anyone's fault.',
'2': 'Castle asked Mrs. Kendall if she thinks Donny was depressed.',
'3': 'Mrs. Kendall didn't blame anyone for the certain way of living they got used to.',
'4': 'Castle inquired about Donny's mental state, suggesting the possibility of depression.',
'5': 'The dialogue implies that Donny might have been dealing with depression.'
(Castle:)Do you think Donny was depressed?
No, Donny was strong. But to lose everything you have...
'1': 'Castle expressed concern about Donny's mental state.',
'2': 'The speaker believes Donny was not depressed.',
'3': 'Castle mentioned that Donny was strong.',
'4': 'Donny lost everything he had.',
'5': 'The speaker acknowledged the difficulty of losing everything.'
(Headmaster:) and Donny was one of our brightest. We thought he'd do great things.
(Castle:) Any idea what he would have been doing at Central Park at night?"
Same H1s.
(Headmaster:) It's strange seeing them without Donny.
(Beckett:) Thank you.
Same H1s.
Via CrossEncoder trained on SNLI and MultiNLI
H0: Castle and Beckett are inquiring about Donny.
FILTERED H1s:
"Castle asked Mrs. Kendall if she thinks Donny was depressed."
"Castle expressed concern about Donny's mental state."
H0: The headmaster suggests Castle and Beckett talk to Donny's friends.
FILTERED H1s: None.
Extra dialogue:
(Headmaster:) If you want to know the rest, you should probably talk to his friends.
(Headmaster:) Amanda, Romy, Brandon, Spencer and Max.
H1s:
'1': 'The headmaster mentioned someone named Amanda.',
'2': 'The headmaster mentioned someone named Romy.',
'3': 'The headmaster mentioned someone named Brandon.',
'4': 'The headmaster mentioned someone named Spencer.',
'5': 'The headmaster mentioned someone named Max.'
Added the automatic X said Y
: "The Headmaster said if you want to know the rest, you should probably talk to his friends."
FILTERED H1s: Still none
Based on the previous example,
HYPOTHESIS: Ted felt happy when Robin asked Ted to be best man.
EVIDENCE:
(Robin:)Ted, will you be my best man?
(Robin:)Scherbatsky...
(Ted:)I'm gonna crush it.
(Barney:)Hi, there.
Taking top 4
'(Ted:)Big mistake, but go on.\n(Robin:)I could really use someone like you.'
"(Ted:)Are you asking what I\nthink you're asking?\n(Robin:)Ted, will you be my best man?"
'(Robin:)Ted, will you be my best man?\n(Robin:)Scherbatsky...'
(Robin:)When we all needed it,\nyou got us back on the right path.\n(Ted:)It's what I do.
None
1. Ted felt happy.
2. Robin asked Ted to be best man.
H0: Ted felt happy.
(Robin:)When we all needed it,\nyou got us back on the right path.\n(Ted:)It's what I do.
(Ted:)Big mistake, but go on.\n(Robin:)I could really use someone like you.
(Robin:)Ted, will you be my best man?\n(Robin:)Scherbatsky...
(Ted:)I'm gonna crush it.\n(Barney:)Hi, there.
H0: Robin asked Ted to be best man.
"(Ted:)It's what I do.\n(Robin:)If I ever get married\nand you're not the guy I'm marrying...")
'(Ted:)Big mistake, but go on.\n(Robin:)I could really use someone like you.')
(Ted:)Are you asking what I\nthink you're asking?\n(Robin:)Ted, will you be my best man?")
(Robin:)Ted, will you be my best man?\n(Robin:)Scherbatsky...')
H0: Ted felt happy.
H1: Ted appears to be determined and enthusiastic.
H0: Robin asked Ted to be best man.
H1: Robin asked Ted to be best man.
H0: Ted felt happy when Robin asked Ted to be best man.
> H1: Robin asked Ted to be best man.
> D1: (Ted:)Are you asking what I think you're asking?
(Robin:)Ted, will you be my best man?
> H2: Ted appears to be determined and enthusiastic.
> D2: (Ted:)I'm gonna crush it.
(Barney:)Hi, there.
Dialogue only
FULL TRANSCRIPT:
HYPOTHESIS:
EVIDENCE:
Step 1: Retrieved dialogue
Retrieval via CrossEncoder trained on MSMarco H1 generation via ChatGPT
SCORE: 0.61 H1:
Claire recently had surgery.
SCORE: 0.56 H1:
Claire did something in Mexico.
Step 2: Filter dialogue
Via CrossEncoder trained on SNLI and MultiNLI
H0:
Claire had surgery in Mexico.
H1:Claire recently had surgery.
LABEL: NEITHERH0:
Claire had surgery in Mexico.
H1:Claire did something in Mexico.
LABEL: NEITHERStep 3: Branching
Via ChatGPT
Step 4: Dialogue retrieval
Retrieval via CrossEncoder trained on MSMarco H1 generation via ChatGPT
H0:
Claire went to Mexico.
Retrieved dialogue:SCORE: 3.05 H1:
Claire did something in Mexico.
H0:
Claire underwent surgery.
Retrieved dialogue:SCORE: 4.00 H1:
Claire recently had surgery.
Step 5: Filtering
Via CrossEncoder trained on SNLI and MultiNLI
H0:
Claire went to Mexico.
H1:Claire did something in Mexico.
LABEL: ENTAILMENTH0:
Claire underwent surgery.
H1:Claire recently had surgery.
LABEL: ENTAILMENTFull generated proof