These past couple days, I've been doing a bit of reading on the nature of anglicisms in Italian, and I've used the reading to help preemptively inform my tagging conventions for the project. So far, this is what I've come up with...

Overview

English and Italian can interact in a host of different ways, each different scenario illuminating a unique relationship between the speaker and the languages. There are a handful of primary phenomena that will likely be tracked throughout the volume, and they include, but are not limited to:

Adaptation or preservation of English orthography within the volume
- The Italian language excludes several of the letters of the English alphabet from its own (e.g. j, k, w, x, and y). One point of interrogation in this research will focus on whether or not anglicisms in the volume preserve these categorically un-Italian characters (e.g. with the loan word "cocktail").
- Sometimes, anglicisms are "italianized" in instances where characters are omitted and/or replaced by other ones because they either:
  1. Are one of the un-Italian letters of the alphabet
  2. Fall into an un-Italian sequence of letters (e.g. absenteeism is italianized from "absenteismo" to "assenteismo" because "ab" is not a grammatical sequence of characters)
Loan words that are either italianized by adding a final vowel to an otherwise entirely English word, or loan words that are not italianized at all.
- With the former type of loan word, the only orthographic change applied to a loan word is the addition of a final vowel (and, when applicable, a doubled preceding consonant). The word is not necessarily adapted to appear more Italian, but the addition of a final vowel does nevertheless italianize it to an extent (e.g. dollar is italianized into "dollaro").
- With the latter type, the word is a carbon-copy of its derivative English counterpart, with no italianization of any sort applied to the word.
Anglicisms that include an Italian phoneme, but with English orthography
- While there is substantial overlap in the phonetic inventories between Italian and English, the same sounds are represented by different letters. With some words in the volume, an English orthography is used to represent an Italian phoneme (e.g. the English term chairman is italianized to "charman." The initial "ch" sound is present in both phonetic inventories, but Italian never uses the "ch" spelling for it).

Tagging Conventions

With the above textual characteristics in mind, I've tentatively developed an attribute listing to help evoke the extent of the interaction between Italian and English in the volume.

New element: w
- This is the only element I am adding to my ODD for the time being; it will likely satisfy most, if not all, of my tagging needs as far as elements are concerned.
Customized attribute(s) from the TEI All schema
- @function='sub' (optional)
  - This attribute will be employed when an English word is used to substitute an equally suitable Italian word. If my document analysis proves that all of the anglicisms in the volume are used in preference to an Italian term, rather than to describe something that was not represented in the Italian language, then I will remove this attribute. If there are far fewer anglicisms that express an inexplicable concept in Italian, then I'll simply inverse the attribute value and attach it to those anglicisms instead.
Customized attributes
- @gender='m/f' (optional)
  - This attribute will be used on nouns and noun phrases to track the gender of the anglicism. The tendency with those who speak Standard Italian is to attach a masculin gender to any noun, unless it refers to a woman (or prototypically feminine object/idea). I want to see if the anglicisms in this volume support that tendency (which would indicate that the men thought more naturally in the Standard language) or if they contradict it (which could possibly indicate that their dialectal language may have taken preference -- I need to learn more about the Luccan dialect to determine this).
- @preserve='yes/no' (optional)
  - This attribute will be used to determine whether or not the English-exclusive orthography of a word was preserved in the anglicism, or if it was replaced by more suitably Italian characters.
- @eng='yes/no' (to be used in conjunction with @preserve)
  - This attribute, when used in conjunction with the @preserve attribute, will discern the words that replaced English-exclusive letters ('yes'), or whether it simply a matter of making the word more Italian (as is the case when "absenteeism" is italianized to "assenteismo")
- @char='[one of the un-Italian letters]' (only when @preserve='yes' and @eng='yes')
  - This attribute value will identify the English-exclusive character that was preserved in the anglicism. I'd like to see if there are certain letters within the bounds of the volume that are disproportionately retained.
- @add='ending/none' (required)
  - This attribute will track the words that are italianized by the simple addition of a final vowel to the word. Every anglicism will either receive a final vowel or not receive a final vowel, so this might be the only mandatory attribute. Although, the others will still be used in abundance.

Analysis

In addition to performing raw analyses of the linguistic information I'm tracking with the above attributes and values, I also have a handful of questions that may come to light, including:

Will a lower tendency to adapt the English orthography allude to an individual with a higher proficiency in English? Will the opposite indicate a stronger preference for Italian?
With gender tracking: do the anglicisms in the volume tend to be gendered as masculine or feminine? Does this indicate a preference for standard or dialectal Italian use?
- This question may come in handy if/when I analyze the extent of potential dialectal spelling within the minutes. Some words are systematically misspelled, but the consistency of these misspellings leads me to believe it may be the result of a dialectal influence.
Can the sum of these analyses produce a general conceptualization of the Lega's attitude towards English, based upon the degree to which they adapt English terms and spellings to their loan words?

Additionally, I have a couple of questions that may loosely tie this semester's research to that of last semester, but those connections will arise organically if they are, in fact, there.

Conclusion

This is just an attempt to write as many of my thoughts down as I can in one sitting to be sure that I have a working foundation for this semester's project. This is certainly subject to substantial revision, but I wanted to write down as many ideas as I had before I dove into the minutes. @ebeshero Consider this my first major check-in!

PS. Forgive me for any misspellings or confusing explanations. Feel free to shoot any questions or comments you may have.... Onward to the markup!

Hi @zme1 and apologies for the long delay! The Tokyo trip was entirely distracting and the jet lag on the other side has been a little more overwhelming than I'd imagined--I've lost some time! But now that I can coherently review this, I do have some questions about your markup ideas.

1) @function="sub" Are there any other values for this optional attribute? I'd suggest making it "subst" to be perfectly clear what it's for (since "sub" could mean other things potentially).

2) I wonder if there's a way to simplify the attribute markup generally--it seems you have a number of attributes that derive meaning from the presence of another attribute. But perhaps one attribute might serve where you have two? I'm not sure of this...but here is what I think: You have this trio, and I wonder if you can reduce it:

@preserve='yes/no' (optional)
This attribute will be used to determine whether or not the English-exclusive orthography of a word was preserved in the anglicism, or if it was replaced by more suitably Italian characters.
@eng='yes/no' (to be used in conjunction with @preserve)
This attribute, when used in conjunction with the @preserve attribute, will discern the words that replaced English-exclusive letters ('yes'), or whether it simply a matter of making the word more Italian (as is the case when "absenteeism" is italianized to "assenteismo")
@char='[one of the un-Italian letters]' (only when @preserve='yes' and @eng='yes')
This attribute value will identify the English-exclusive character that was preserved in the anglicism. I'd like to see if there are certain letters within the bounds of the volume that are disproportionately retained.

I wonder whether you could just use @char by itself for all of this? If @char only ever contains "un-Italian" letters as you say, shouldn't its presence be enough to indicate that "English-exclusive orthography" is present, and that it is in English? The three attributes together seem a little much, as if to say, English orthography is here, and is English, and is this...when really all you may need is just to isolate the characters. What do you think? I might be missing something here...

@zme1 A simpler version of this question: If @preserve="yes", isn't the value of @eng always going to be "yes" as well?

Where feature meanings or uses are interdependent, would feature structure be appropriate?

On Sep 21, 2018, at 6:46 AM, Elisa Beshero-Bondar notifications@github.com wrote:

@zme1 A simpler version of this question: If @preserve="yes", isn't the value of @eng always going to be "yes" as well?

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub, or mute the thread.

@djbpitt What @zme1 proposes here is for inline markup, so I imagine that feature structures might apply in a stand-off way perhaps to catalog the features he is finding. I don't think I'd want to advise that he cast the Lega records into feature structures, but perhaps I don't understand how they're properly used in linguistics.

Take a look at the Feature Structures chapter (Ch. 18) here: http://www.tei-c.org/release/doc/tei-p5-doc/en/html/FS.html I've used it for tabulating and cataloging relationships in a stand-off way (since feature structures would not be consistent with the Lega inline markup). If you imagine using this, perhaps you could collect all your anglicisms in the <w> elements first with minimal attributes there, and extract them with distinct values to form the basis of a feature structures document. See if it's useful for you first--I don't know that you necessarily need the extra work if there are more efficient ways to analyze your anglicisms in their immediate contexts in the minutes (and perhaps the surrounding context may be important and the minutes themselves sufficiently tabular to suit your needs). If, however, the feature structures are useful on their own as the basis of a tabulated chart, perhaps you do want to experiment there.

The ultimate aim of the markup as it operates right now (although I must admit I have not implemented very much of it quite yet at this point in the semester) is to describe each of the anglicisms according to a possible array of characteristics it may contain, including:

Nominative genders
Substitution of English-exclusive letters or phoneme/s that are not grammatical in Italian
- Specification on the nature of the substitution listed above
Addition of italianized endings

The extent to which the Italians in the Lega adapt the anglicisms they use will hopefully fall on a spectrum, ranging from "directly applying an English term with no modification whatsoever" to "heavily modifying an English term to comply with Italian grammar." While I spent time exploring the TEI inventory to investigate whether or not I could use TEI markup, I really only looked at attributes and attribute classes (up until this point I assumed I would just use the w element). I'm now looking at the fs and f elements in the TEI, and I think they might be able to work, right?

The issue to address from here, though, is that I need to formulate markup that would fully and completely address these potential combinations, i.e.

Words that are not adapted at all
Words that are adapted only by the addition of an Italian suffix
Words that are adapted by the addition of an Italian suffix and a change to ungrammatical letter combinations
Words that are adapted by the addition of an Italian suffix and a change to un-Italian letters
The gender of any noun loanwords (can apply to all the above choices)

(I highly doubt that any of the loanwords we find in the volume will adapt un-Italian or ungrammatical letters without also adding an Italian suffix, so I omitted those from the list of potential combinations)

I think that the fs and/or f elements may be able to express these different combinations, and I think that this is what they are designated for (although I'm writing this response as I'm reading through the TEI Guidelines, so I may be incorrect).

The issue seems to be, though, that the fs and f elements don't seem like they are typically used as inline markup, which @ebeshero said. Among the TEI examples on the site, I don't find any that seem like they address the current context I'm working in. Am I correct in saying that you want me to consider creating a feature library, either in my TEI file or as another document, and using pointers to identify the type of anglicism? I'm not certain if I'm on the right track, overcomplicating this, or just misinterpreting what's been said.

@zme1 I think you're right about how you can use feature structures. As the TEI chapter describes it, the idea is to construct a "feature library" or a "feature-value library". I was characterizing the use of it as "stand-off" before, but maybe I should clarify that: it might be "stand-off" in the way a personography or placeography is supplemental to an edition. So, perhaps your Toscana edition could contain inline markup of words that point out to particular forms you'd store a feature library for de-referencing, e.g. <w ref="#anglicism-type1">.

This is a different than what you were angling to do with just inline markup, where all the categorical features you describe are currently being defined on each <w> element . Because you envision basically five combinations (as you've outlined here), writing these basic combinations up in a separate feature library seems manageable. In a feature structure file, you could define the five combinations each as a distinct feature, perhaps with its patterns defined as values, and point to one of the five combinations in a single attribute on your <w> elements as you're collecting anglicisms in the Lega documents. The TEI feature structures module gives you some pretty handy options for doing this. Later on perhaps you could write some XSLT to pull all the anglicisms of each feature type from the Lega files and patch them into the feature library as specific values to exemplify each of your five possible forms.

What do you both think? (@djbpitt and @zme1 ) I've played with feature structures before when I've needed something additional for analysis that basically wouldn't easily fit in my inline markup--but I haven't really used it for linguistic analysis of word forms so I'll defer to David who may have more insights on this.

@ebeshero I think, depending on how extensive the anglicisms actually appear to be in the volume, that may be a great way to approach it. Even if there aren't as many anglicisms as I originally thought (which may be the case, since I did a first pass of a year of minutes relatively quickly), it may be preferable to remain entirely in the TEI namespace. Rather than link to another file, do you think I'd be able to generate a feature library in the teiHeader element with a fsdDecl element?

@ebeshero I am going to temporarily close this issue until I can say I'm happy with my tagging conventions. I am going to post an Issue in a few minutes with updates on my ODD.

zme1 / toscana

Tagging Convention Brainstorm #49

Overview

Tagging Conventions

Analysis

Conclusion