sjw82 / Midrash

Our Final Project: a historical analysis of Midrashic Text
http://midrash.obdurodon.org/
0 stars 0 forks source link

Schema Clarifications 2/20 #22

Open sjw82 opened 5 years ago

sjw82 commented 5 years ago
  1. start = midrash
  2. midrash = element midrash {verse+, midrashim+}
  3. verse = element verse {versenum+ & versetext+}
  4. versenum = element versenum {text}
  5. versetext = element versetext {text}
  6. midrashim = element midrashim {source, p+}
  7. source = element source {mixed {midtext, edition?, midpass}}
  8. midtext = element midtext {text}
  9. edition = element edition {text}
  10. midpass = element midpass {text}
    • This refers to the passage number associated with the specific paragraph within the midrash
  11. p = element p {content*}
  12. content = mixed {(quote|ref|paren|litdev|function|quotation|hebrew|translation)*}
    • I'm just adding to this as I add things to the xml
  13. quote = element quote {text, content}
  14. ref = element ref {(rabbi|bibverse|item|character) & content}
    • I've been using reference to tag both external and internal references so referencing the work of other Rabbis as well as characters within the Torah
  15. rabbi = attribute rabbi {text}
  16. bibverse = attribute bibverse {text}
    • This I've been using to tag internal references to other verses as 'Genesis 14:10` but I'm unsure about what I'm putting in the attribute to make it max searchable.
  17. item = attribute item {text}
    • This is the least well defined criteria; I've only used it for soul and my thinking was that it was an item that reoccurs in rabbinic literature and would be worth examining later but didn't require its own tag
  18. character = attribute character {text}
  19. paren = element paren {content}
  20. litdev = element litdev {(simile|metaphor|wordplay) & content}
  21. simile = attribute simile {text}
  22. metaphor = attribute metaphor {text}
  23. wordplay = attribute wordplay {text}
  24. function = element function {(definition|allegory|allegory) & content}
    • Here I'm referring to the function of the sentence(s) for the reader
  25. definition = attribute definition {text}
  26. allegory = attribute allegory {text}
  27. quotation = element quotation {content*}
  28. hebrew = element hebrew {content*}
    • I'm using this to mark literal Hebrew text i.e. words that aren't translated
  29. translation = element translation {heb, text}
    • I'm using this to mark where a word is defined in English but the original Hebrew is given
  30. heb = attribute heb {text}
djbpitt commented 5 years ago

@sjw82 The schema documentation is important, and you can even do it inside the schema file itself, using Relax NG comment syntax. Comments in Relax NG start with a hash mark and continue until the end of the line. There are no multi-line comments, so if you have a multi-line comment, you have to put a hash mark at the beginning of each line. Here’s a sample one-line comment:

# midpass refers to the passage number associated with the specific paragraph within the midrash
midpass = element midpass {text} 

I usually write long one-line comments above the line of code they refer to; if the comment is short, I write it at the end of the line line:

midpass = element midpass {text} # passage number

Some of your content models can be simplified to make them more legible. For example, where you write:

ref = element ref {(rabbi|bibverse|item|character)* & content*}

I think what you want is:

ref = element ref { rabbi?, bibverse?, item? character?, content }

Similarly, where you write:

function = element function {(definition|allegory|allegory)* & content*}
definition = attribute definition {text}
allegory = attribute allegory {text}

it would be better to replace the first line with:

function = element function { definition?, allegory?, content }

The issue with both of the preceding cases is that attributes cannot be repeated on an element, so writing allegory twice in the second content model is misleading, as it putting an asterisk after the attributes in both, since the asterisk allows repetition. The revisions make the attributes optional but not repeatable, which I think is what you want. (If you do want to allow multiple allegories on a function or multiple items on a ref, you can't do it by repeating an attribute, since <function allegory="x" allegory="y"> is not well formed. We can talk about alternatives if that is what you’re trying to do.)

I also removed the asterisk from content because that’s already defined as a mixed-content repeatable or-group, so it always has all of the optionality and repetition it can have. The extra asterisk seems not to do any harm because it doesn’t contribute meaning one way or the other, but because its presence is misleading, it’s better to remove it.

If you’d like to take a crack at tightening up the schema and then push the update to GitHub, I’ll be happy to take another look. You seem to have the basic functionality you want, at least to get started on validating your markup, and this sort of clean-up is a common stage in the process. in other words, what you have above is a big step forward.

sjw82 commented 5 years ago

@djbpitt Thank you for the feedback! I've updated the schema if you wouldn't mind giving it another look. In the case that an attribute is required, would it look better to write it { att1?, att2?, att3?, content } or { (att1 | att2 | att3)?, content}?

djbpitt commented 5 years ago

@sjw82 I'll take a closer look later this weekend, but to answer your immediate question, the two expressions mean different things, and the one you want is the first one:

{ att1?, att2?, att3?, content } means that any, none, or all of those attributes may be present, followed by content. Because attributes are informationally unordered (that is <p att1="x" att2="y"> is informationally identical to <p att2="y" att1="x">, you can specify attributes in your content model in any order. The comma connector means “in this specific order” with elements, but because attributes are inherently unordered in the XML data model, the connector has a different meaning with attributes than it does with elements.

`( ( att1 | att2 | att3 )?, content } means that you can have zero attributes or only one of the three specified attributes, but not more than one. That's because the question mark after the parentheses applies to the entire parenthesized expression. That is, it says that you can perform the action of choosing among the attributes zero times or one time.