zme1 / toscana

A repository to house research and web development for the Lega Toscana project, led by professor Lina Insana (Spring 2018) and professor Lorraine Denman (Fall 2018), and with consultation from members of the DH Advanced Praxis group at the University of Pittsburgh at Greensburg.
http://toscana.newtfire.org
3 stars 1 forks source link

Tagging Convention Brainstorm #49

Closed zme1 closed 5 years ago

zme1 commented 6 years ago

These past couple days, I've been doing a bit of reading on the nature of anglicisms in Italian, and I've used the reading to help preemptively inform my tagging conventions for the project. So far, this is what I've come up with...

Overview

English and Italian can interact in a host of different ways, each different scenario illuminating a unique relationship between the speaker and the languages. There are a handful of primary phenomena that will likely be tracked throughout the volume, and they include, but are not limited to:

Tagging Conventions

With the above textual characteristics in mind, I've tentatively developed an attribute listing to help evoke the extent of the interaction between Italian and English in the volume.

Analysis

In addition to performing raw analyses of the linguistic information I'm tracking with the above attributes and values, I also have a handful of questions that may come to light, including:

  1. Will a lower tendency to adapt the English orthography allude to an individual with a higher proficiency in English? Will the opposite indicate a stronger preference for Italian?
  2. With gender tracking: do the anglicisms in the volume tend to be gendered as masculine or feminine? Does this indicate a preference for standard or dialectal Italian use?
    • This question may come in handy if/when I analyze the extent of potential dialectal spelling within the minutes. Some words are systematically misspelled, but the consistency of these misspellings leads me to believe it may be the result of a dialectal influence.
  3. Can the sum of these analyses produce a general conceptualization of the Lega's attitude towards English, based upon the degree to which they adapt English terms and spellings to their loan words?

Additionally, I have a couple of questions that may loosely tie this semester's research to that of last semester, but those connections will arise organically if they are, in fact, there.

Conclusion

This is just an attempt to write as many of my thoughts down as I can in one sitting to be sure that I have a working foundation for this semester's project. This is certainly subject to substantial revision, but I wanted to write down as many ideas as I had before I dove into the minutes. @ebeshero Consider this my first major check-in!

PS. Forgive me for any misspellings or confusing explanations. Feel free to shoot any questions or comments you may have.... Onward to the markup!

ebeshero commented 6 years ago

Hi @zme1 and apologies for the long delay! The Tokyo trip was entirely distracting and the jet lag on the other side has been a little more overwhelming than I'd imagined--I've lost some time! But now that I can coherently review this, I do have some questions about your markup ideas.

1) @function="sub" Are there any other values for this optional attribute? I'd suggest making it "subst" to be perfectly clear what it's for (since "sub" could mean other things potentially).

2) I wonder if there's a way to simplify the attribute markup generally--it seems you have a number of attributes that derive meaning from the presence of another attribute. But perhaps one attribute might serve where you have two? I'm not sure of this...but here is what I think: You have this trio, and I wonder if you can reduce it:

@preserve='yes/no' (optional)
This attribute will be used to determine whether or not the English-exclusive orthography of a word was preserved in the anglicism, or if it was replaced by more suitably Italian characters.
@eng='yes/no' (to be used in conjunction with @preserve)
This attribute, when used in conjunction with the @preserve attribute, will discern the words that replaced English-exclusive letters ('yes'), or whether it simply a matter of making the word more Italian (as is the case when "absenteeism" is italianized to "assenteismo")
@char='[one of the un-Italian letters]' (only when @preserve='yes' and @eng='yes')
This attribute value will identify the English-exclusive character that was preserved in the anglicism. I'd like to see if there are certain letters within the bounds of the volume that are disproportionately retained.

I wonder whether you could just use @char by itself for all of this? If @char only ever contains "un-Italian" letters as you say, shouldn't its presence be enough to indicate that "English-exclusive orthography" is present, and that it is in English? The three attributes together seem a little much, as if to say, English orthography is here, and is English, and is this...when really all you may need is just to isolate the characters. What do you think? I might be missing something here...

ebeshero commented 6 years ago

@zme1 A simpler version of this question: If @preserve="yes", isn't the value of @eng always going to be "yes" as well?

djbpitt commented 6 years ago

Where feature meanings or uses are interdependent, would feature structure be appropriate?

On Sep 21, 2018, at 6:46 AM, Elisa Beshero-Bondar notifications@github.com wrote:

@zme1 A simpler version of this question: If @preserve="yes", isn't the value of @eng always going to be "yes" as well?

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub, or mute the thread.

ebeshero commented 6 years ago

@djbpitt What @zme1 proposes here is for inline markup, so I imagine that feature structures might apply in a stand-off way perhaps to catalog the features he is finding. I don't think I'd want to advise that he cast the Lega records into feature structures, but perhaps I don't understand how they're properly used in linguistics.

ebeshero commented 6 years ago

Take a look at the Feature Structures chapter (Ch. 18) here: http://www.tei-c.org/release/doc/tei-p5-doc/en/html/FS.html I've used it for tabulating and cataloging relationships in a stand-off way (since feature structures would not be consistent with the Lega inline markup). If you imagine using this, perhaps you could collect all your anglicisms in the <w> elements first with minimal attributes there, and extract them with distinct values to form the basis of a feature structures document. See if it's useful for you first--I don't know that you necessarily need the extra work if there are more efficient ways to analyze your anglicisms in their immediate contexts in the minutes (and perhaps the surrounding context may be important and the minutes themselves sufficiently tabular to suit your needs). If, however, the feature structures are useful on their own as the basis of a tabulated chart, perhaps you do want to experiment there.

zme1 commented 6 years ago

The ultimate aim of the markup as it operates right now (although I must admit I have not implemented very much of it quite yet at this point in the semester) is to describe each of the anglicisms according to a possible array of characteristics it may contain, including:

The extent to which the Italians in the Lega adapt the anglicisms they use will hopefully fall on a spectrum, ranging from "directly applying an English term with no modification whatsoever" to "heavily modifying an English term to comply with Italian grammar." While I spent time exploring the TEI inventory to investigate whether or not I could use TEI markup, I really only looked at attributes and attribute classes (up until this point I assumed I would just use the w element). I'm now looking at the fs and f elements in the TEI, and I think they might be able to work, right?

The issue to address from here, though, is that I need to formulate markup that would fully and completely address these potential combinations, i.e.

(I highly doubt that any of the loanwords we find in the volume will adapt un-Italian or ungrammatical letters without also adding an Italian suffix, so I omitted those from the list of potential combinations)

I think that the fs and/or f elements may be able to express these different combinations, and I think that this is what they are designated for (although I'm writing this response as I'm reading through the TEI Guidelines, so I may be incorrect).

The issue seems to be, though, that the fs and f elements don't seem like they are typically used as inline markup, which @ebeshero said. Among the TEI examples on the site, I don't find any that seem like they address the current context I'm working in. Am I correct in saying that you want me to consider creating a feature library, either in my TEI file or as another document, and using pointers to identify the type of anglicism? I'm not certain if I'm on the right track, overcomplicating this, or just misinterpreting what's been said.

ebeshero commented 6 years ago

@zme1 I think you're right about how you can use feature structures. As the TEI chapter describes it, the idea is to construct a "feature library" or a "feature-value library". I was characterizing the use of it as "stand-off" before, but maybe I should clarify that: it might be "stand-off" in the way a personography or placeography is supplemental to an edition. So, perhaps your Toscana edition could contain inline markup of words that point out to particular forms you'd store a feature library for de-referencing, e.g. <w ref="#anglicism-type1">.

This is a different than what you were angling to do with just inline markup, where all the categorical features you describe are currently being defined on each <w> element . Because you envision basically five combinations (as you've outlined here), writing these basic combinations up in a separate feature library seems manageable. In a feature structure file, you could define the five combinations each as a distinct feature, perhaps with its patterns defined as values, and point to one of the five combinations in a single attribute on your <w> elements as you're collecting anglicisms in the Lega documents. The TEI feature structures module gives you some pretty handy options for doing this. Later on perhaps you could write some XSLT to pull all the anglicisms of each feature type from the Lega files and patch them into the feature library as specific values to exemplify each of your five possible forms.

What do you both think? (@djbpitt and @zme1 ) I've played with feature structures before when I've needed something additional for analysis that basically wouldn't easily fit in my inline markup--but I haven't really used it for linguistic analysis of word forms so I'll defer to David who may have more insights on this.

zme1 commented 6 years ago

@ebeshero I think, depending on how extensive the anglicisms actually appear to be in the volume, that may be a great way to approach it. Even if there aren't as many anglicisms as I originally thought (which may be the case, since I did a first pass of a year of minutes relatively quickly), it may be preferable to remain entirely in the TEI namespace. Rather than link to another file, do you think I'd be able to generate a feature library in the teiHeader element with a fsdDecl element?

zme1 commented 6 years ago

@ebeshero I am going to temporarily close this issue until I can say I'm happy with my tagging conventions. I am going to post an Issue in a few minutes with updates on my ODD.