Tagging nouns in a Pro-drop Language

zme1 / toscana

A repository to house research and web development for the Lega Toscana project, led by professor Lina Insana (Spring 2018) and professor Lorraine Denman (Fall 2018), and with consultation from members of the DH Advanced Praxis group at the University of Pittsburgh at Greensburg.

http://toscana.newtfire.org

3 stars 1 forks source link

Tagging nouns in a Pro-drop Language #1

Closed zme1 closed 6 years ago

zme1 commented 6 years ago

Italian is a pro-drop language; that is, subject nouns or pronouns can be omitted from a sentence because their syntactic qualities (person, number) are expressed in the conjugated verb in the sentence.

In my transcripts, for example, a sentence of interest reads:

Alle ore 930 p.m Presidente M Frediani dichiarò la Seduta Aperta Ordinò Lappello Nominate il Curatore Sam Paganucci

"At 9:30pm President M Frediani called the meeting to order. He asked curator Sam Paganucci to take roll"

Here, the word of interest is "ordinò" which translates to "he ordered/asked" in reference to the president. I want to mark-up this interaction, but to do so I would need to account for both people involved. Since the subject of the sentence is represented in the verb form, I was thinking about using a <rs> element and attach a @ref attribute to it as I would with a <persName> element.

Any thoughts or advice on this?

zme1 commented 6 years ago

Note: there are also instances in which members are referenced using only their role names (e.g. Il Presidente).. Could this situation be treated in a similar way to the one above?

RJP43 commented 6 years ago

Hey @zme1 --

Do you care in particular the actual part of speech of the word or is what is most important to you linking the different people and capturing the linking word?

zme1 commented 6 years ago

My only interest in any interaction like this one is to flag any sort of interactions between members in its entirety, and to label the respective role of any participants in the interaction. For example:

<seg type="question"><persName ref="#ze" role="asker">Zac Enick</persName>asked a question, and <persName ref="#rp" role="answerer">Becca Parker</persName> answered it.</seg>

In the example above from my text, there is no noun to attach the <persName> element to, so I was wondering if the <rs> element could be used to wrap "ordinò" which translates to "he asked" (in reference to President Frediani) as a way to represent the first participant in that sampled interaction since he is the assumed subject of the sentence.

RJP43 commented 6 years ago

@zme1 I like your thought process to use the rs (reference string) and tying all of the parts of the interaction together using a linking attribute like @ref, but I am wondering how will you later grab these when you need to extract your data. Sure, you could work with the sibling axises and grab all of the interactions that are immediate siblings of certain people, but I am wondering if it makes more sense to add emphasis on the interaction word and the unit (sentence or phrase) that contains the people and the interaction word.

Could you markup your example sentence how you are envisioning it with the <rs>, <persName>, and @ref? And then let's see if we can write the XSLT or XQuery that would pull the necessary data points to make a network graph. Basically, I am suggesting we work backward to be sure however you encode the document will allow you to pull the data points: source node, target node, interaction, and any attributes on those three (so for interaction might be interesting to see all the interactions that are of certain part of speech or the number of interactions between same people).

RJP43 commented 6 years ago

Oh haha we posted same time... hold on my comment may now be outdated...

RJP43 commented 6 years ago

@zme1 Okay yea I am glad you are grabbing the many parts in a parent element that was one idea I was suggesting at in my comment. I am now wondering what attributes you will use on the <rs> to link the people/nouns or if you are just relying on them being in proximity?

zme1 commented 6 years ago

Yeah, I can certainly do that! If I were to code the above sentence how I am proposing, it would look like this:

<seg type="delegation" subtype="task"><rs ref="#mf" role="delegator">Ordinò</rs> Lappello Nominate il Curatore <persName ref="#sp" role="delegated">Sam Paganucci</persName></seg>

As far as attributes are concerned, I would like to ultimately try to associate the interaction with the meeting in which it occurred so I could see how interactions developed or evolved over time, but I haven't figured out a system for that yet. Each individual meeting is nested in a <TEI> element inside a <teiCorpus> so could I potentially use the ancestor axis when transforming to my plain text files to pull the meeting date from its ancestor <teiHeader> element?

RJP43 commented 6 years ago

@zme1 Ahhhh okay! I see. Yes makes sense and yea you can just pull the date from the header. I think your encoding is logical and I can see how you will get your data points. I guess the only other question I have is in regard to how right now the action is directed from one person to another so do you want to capture that order/direction? You might want to consider an attribute that does that instead of relying on placement within <seg> unless you can always be sure the actor is first and the receiver is second or is that what you are relying on the @role for? If that is what @role is doing for you then could you imagine there being excessive role attributes or is it always going to be delegator and delegated?

zme1 commented 6 years ago

That is a good point. I didn't anticipate that I would generate too many attribute values for @role, but being that the nature of the interactions is either non-directional or directional, I could conceivably have just 3 values, right? One to indicate all parties in a non-directional interaction, and 2 to indicate the source and target in a directional? Because all the interactions have a specific @type value on the <seg> element which will be enough to distinguish one directional (or non-directional) interaction from another?

RJP43 commented 6 years ago

@zme1 Exactly! Wondering what attribute would be best for that. Not sure @role is best but I am sure @ebeshero can provide some insight on whether that is "tag-abuse" or not.

Nice work! I really like this encoding setup. Hope I helped instead of introducing too many questions/confusion.

zme1 commented 6 years ago

No, you definitely helped me to better understand how to work with this data!!! I'll look into different potential attributes to substitute for @role if possible. I also think that, while I am technically labelling the interaction between a person making a proposal and another person verbally supporting that proposal as a non-directional interaction, I'll preserve the distinction between who does what in those instances because I think it'll be an important distinction to make.

Thank you very much @RJP43 !