paulfioravanti / plover-q-and-a

Plover plugin for formatting Legal Q&A (Question and Answer) output.
https://pypi.org/project/plover-q-and-a/
GNU General Public License v3.0
1 stars 0 forks source link

Add way to individually change the byline question #1

Closed hellochap closed 1 month ago

hellochap commented 2 months ago

basically I wanted the normal questions to have a new line before, but the question after the byline to not.

the problem was that the byline just I automatically added a question to the end of the byline output, which meant I couldn't make the question and the byline question different, so I had to delete the _formatted_question(data) part of the byline construction and replace it with a string of what I wanted the different question format to be.

should be easily fixable heh, probably just like a new parameter in the byline part for like, just it's question.

hellochap commented 2 months ago

or I just had an idea, maybe make it so that the question is part of the marker and then delete the _formatted_question(data) part altogether. That would solve it in a neat way I think

paulfioravanti commented 2 months ago

Based off of our conversation in the Plover Discord, I think it's worth re-framing this issue as trying to answer the question of:

"Can this plugin be configured to output Q&A in the format that would seem to be used in the Tasmanian legal system, as shown in this document? If so, what does the config look like? If not, what would need to change within the plugin?"

Since the only frame of reference I used in the creation of this plugin was the way that Platinum Steno (PS) does Q&A, there are potentially functionality gaps where configuration can be made more granular, or perhaps areas that could require a re-think in the architecture of how each of the Q&A parts are assembled (and then decisions made on whether it's worth implementing them or not...).

First, it's probably worth looking into what's different between the PS Q&A format (an example of which is here), and the format used in the document referenced above (which I'll call "Tasmanian-style" for now), because after doing a cursory glance, it's not just about the bylines. Here's what I noticed:

NOTE: "⇥···" below signifies a tab character

Initial question on same line as byline

PS's bylines add the initial question indented underneath the byline, while Tasmanian-style shows the initial question on the same line, like the following example on page 28:

MS BENNETT:⇥···Q.⇥···Kim, Barry, thank you for being with us
today. Kim, you've made a statement to the Commission, is
that right?

An interesting point about the questioner byline is that it only ever seems to appear once at the beginning of Q&A: a questioner is never "re-checked in" with a new byline to signify returning to the original line of questioning if something is said by a third party (eg other lawyer, the court judge etc) like PS does.

I guess this has something to do with the statement just above it(?):

<EXAMINATION BY MS BENNETT:

Anyway, this is probably something that perhaps be handled with making extra configuration options available, but I'll have to look into how to do that without breaking current functionality.

No Q or A initial indentation

From page 28 again, following on from the last example, it seems that as opposed to PS, where questions and answers are tab-indented under a byline, Tasmanian-style has them at the same indent level as bylines, which is to say no indentation at all:

MS BENNETT:⇥···Q.⇥···Kim, Barry, thank you for being with us
today. Kim, you've made a statement to the Commission, is
that right?
A.⇥···That's correct.

Further sets of Q and A are also not indented:

Q.⇥···And that statement is true and correct, isn't it?
A.⇥···That's correct.

This is something I think can be handled with the current configuration options.

Newlines between Q&A "sets" and other Named Speakers

From page 28 again, we can see that as apposed to PS, where Q&A back and forth just occurs on different lines, it would seem that Tasmanian-style breaks up Q&A "sets" with extra newlines:

MS BENNETT:⇥···Q.⇥···Kim, Barry, thank you for being with us
today. Kim, you've made a statement to the Commission, is
that right?
A.⇥···That's correct.

Q.⇥···And that statement is true and correct, isn't it?
A.⇥···That's correct.

Another example of this with other named speakers outside of Q&A is on page 29:

Q.⇥···I'd like to now talk to you about Paula's life as it
developed. Barry, just to pause, you came into Paula's
life a little later, can you tell us about that?

BARRY:⇥···Yes. About years ago Paula come into my life
the same time as Kim, yeah, so there's a lot of background
there.

MS BENNETT:⇥···Commissioners, you will find behind
Confidential Exhibit 1 some photos of Paula which you can
peruse in your own time and I won't display publicly. But
we just pause to acknowledge Paula and the young woman she
was.

The extra new lines would seem to occur regardless of questioner or answerer ending, be it a question, statement, or interruption, as can be seen in the following examples:

Newline after answerer interruption

From page 30:

Q.⇥···What sort of contact were they having?
A.⇥···Well, it was only one contact that I knew about where
he made an arrangement to pick up Paula from work and he
took her off to a secluded spot and --

Q.⇥···Just to pause there. She told you that night that she
would be meeting friends in town; is that right?
A.⇥···That's right, yes.

Newline after answerer asks a question

From page 32:

Q.⇥···Your daughter heard from AB-1 again shortly after that
incident on the Sunday night; is that right?
A.⇥···I'm not sure that was - it could have been - was it?

BARRY:⇥···Phone calls?

KIM:⇥···Yeah, there were some phone calls, he was trying to
call her, and then when he couldn't reach her by phone he
sent a letter to the house.

I have a feeling this is something that will need to be solved with extra granular configuration options. Where I thought that the following options could be generalised...

"question_end": "?",
"statement_end": ".",
"interrupt": " --"

...I have a feeling that in order to accommodate Tasmanian-style, these options will have to be repeated in each of the "formatting" objects for all of "speaker", "question", and "answer". Maybe like the following (but not sure yet...):

"answer": {
  "marker": "A.",
  "formatting": {
    "pre": "",
    "post": "\t",
    "question_end": "?\n\n",
    "statement_end": ".\n\n",
    "interrupt": " --\n\n",
  }
}

Answer bylines after named speakers

This would seem to be a concept that is not present in PS, but occurs in Tasmanian-style, where regardless of notices like the following on page 28...

[All Q&A are answered by Kim unless indicated as Barry]

...whenever something from a named third party speaker is recorded, and the next speaker "set" is back to Q&A, it seems that the answerer is "re-checked in" by name. An example of this is on page 32:

Q.⇥···Take us back then to the time you've taken Paula -
I'll just pause there. Barry, is there anything you wanted to add at this
stage?

BARRY:⇥···No, that's okay, thank you.

Q.⇥···Just returning then to Paula being taken to LGH;
obviously that's a frightening experience for any family.
Were you reassured to see somebody that you knew?
KIM:⇥···A.⇥···Yes, I was, yes, because Jim was always a very
friendly, outgoing, caring person and he just had that way
about him that made you feel that you could trust him and
that he was going to look after your child, so yeah.

And another on page 39, even if the named speaker is the same as the answerer:

Q.⇥···Kim, I want to just pause there, and Barry, is there
anything you'd like to add to what we've heard?

BARRY:⇥···No, all I can remember, that Jim and Paula's
relationship just suddenly stopped for no reason. We were
aware of the reason, Paula told us, but we both can't
recall that particular reason why, so we don't know.

KIM:⇥···That's where Paula was a very private person.

Q.⇥···Paula then went to and there was a <redacted>
terrible accident there, are you able to tell us?
KIM:⇥···A.⇥···Sadly, Paula, <redacted> and passed away, yes.

Support for something like this I'd have to think about, as it would require not just formatting, but probably another set of metas like, perhaps:

"{:Q_AND_A:BYLINE:ANSWERER:FOLLOWING_QUESTION}"
"{:Q_AND_A:BYLINE:ANSWERER:FOLLOWING_STATEMENT}"
"{:Q_AND_A:BYLINE:ANSWERER:INTERRUPTING}"

And then figuring out some appropriate dictionary entries for them that gel with the rest of the Q&A dictionary outlines.


Anyway, I'll update this issue as things come to mind. Thanks for opening these floodgates, @hellochap! :D

hellochap commented 1 month ago

Ah very nice, this is a good break down. I feel that a lot of these problems can actually be solved with the current settings. Currently I have it so that, before every question, there is a new line, and before every answer, there is no new line. And this seems to solve the grouped question and answer problem.

The main two I see here is the answer byline, which I have only just noticed now, and the lack of indentations for bylines, which I feel could be solved by merely making the question part of the byline part of the byline text, instead of being formatted after.

These are my current settings:

{

  "speaker": {
    "plaintiff_1": "MR. STPHAO",
    "defense_1": "MR. EUFPLT",
    "plaintiff_2": "MR. SKWRAO",
    "defense_2": "MR. EURBGS",
    "court": "THE COURT",
    "witness": "THE WITNESS",
    "videographer": "THE VIDEOGRAPHER",
    "court_reporter": "THE COURT REPORTER",
    "clerk": "THE CLERK",
    "bailiff": "THE BAILIFF",
    "formatting": {
      "pre": "\n",
      "post": ":  ",
      "upcase": true
    }
  },
  "question": {
    "marker": "Q.",
    "formatting": {
      "pre": "\n",
      "post": "\t"
    }
  },
  "answer": {
    "marker": "A.",
    "formatting": {
      "pre": "",
      "post": "\t"
    }
  },
  "byline": {
    "marker": "",
    "formatting": {
      "pre": "\n",
      "post": ":\t"
    }
  },
  "question_end": "?",
  "statement_end": ".",
  "interrupt": " --",
  "yield": "\n",
  "sentence_space": " ",
  "set_name_prompt": "[Set {speaker_type} ({current_speaker_name}) =>] "
}

And this is what I altered in the parser (deleted the question formatting and replaced with a simple string):

    return lambda speaker_name: (
        byline_formatting.get("pre", _BYLINE_PRE_FORMATTING)
        + cast(str, byline.get("marker", _BYLINE_MARKER))
        + speaker_name
        + byline_formatting.get("post", _BYLINE_POST_FORMATTING)
        + "Q.\t"
    )

All of this seems to work for everything except for answer bylines, which I hadn't noticed yet.

also maybe there is a need for multiple witnesses? I don't really know how that works normally though.

paulfioravanti commented 1 month ago

Okay, I think I've managed to address all the issues addressed here in v0.4.0. I wasn't expecting such a massive amount of changes and refactoring, but some things that look easy can turn out to be deceptively complex 😄

@hellochap, if you upgrade, just be aware that there are some breaking changes to your config and outline commands that you'll need to change in your own config and dictionaries. Check out the tag link above for more details. If you do upgrade and find any bugs related to anything in this thread, feel free to re-open this issue.