kshawkin / Best-Practices-for-TEI-in-Libraries

Best Practices for TEI in Libraries: A guide for mass digitization, automated workflows, and promotion of interoperability with XML using the TEI
http://purl.oclc.org/NET/teiinlibraries
32 stars 8 forks source link

proofreading auto-generated HTML #44

Closed kshawkin closed 6 years ago

kshawkin commented 8 years ago

Once we've resolved all other issues, we should check that the process of creating the HTML produces a complete document that is readable, and while we're at it, we should proofread the text.

In an email to me on 2016-09-13, @jrgriffiniii kindly volunteered to do this.

kshawkin commented 7 years ago

@jrgriffiniii wrote to me today to say that he's transitioning to a new institution and ca no longer commit to doing this proofreading for us in the near future.

emylonas commented 7 years ago

During today's BPTL call, we decided that once we've implemented all other non-dormant issues, we'll ask for comments on TEILIB-L. Once we've made any adjustments based on comments, we'll ask for a volunteer to proofread the whole thing closely. If we don't get any, Syd will ask if a WWP encoder could take on this job, or Kevin might ask one of his employees with an interest in DH to do it.

kshawkin commented 7 years ago

When proofreading, standardize on either "such as" or "e.g.". I prefer "such as" because I find it to be more transparent (not requiring that you know the meaning of this Latin abbreviation).

kshawkin commented 7 years ago

Syd is working to fix the rendering in the HTML output to insert bibliographic info at the top and remove the phrase in parentheses. Once that's done, Kevin will need to copy over files to http://www.tei-c.org/SIG/Libraries/teiinlibraries/3.1.0a/ and build a parallel system for past versions of the BPTL.

In the meantime, here's my draft call for comments:

== subject: call for comments on draft of Best Practices for TEI in Libraries, version 3.1.0a ==

Dear TEI colleagues,

In 2011 the TEI SIG on Libraries published version 3.0 of Best Practices for TEI in Libraries ( http://purl.oclc.org/NET/teiinlibraries ). This ODD-based customization includes extensive prose documentation and schemas for "levels" of encoding for a range of encoding practices, especially but not only in libraries.

Much has changed since then, so a subgroup of the SIG has been working on a revision to this document to bring it into conformance with the latest version of P5 and to address some shortcomings of version 3.0.

The workgroup invites comments on its draft of version 3.1.0a, its update to the document and schema. You can find these files at:

http://www.tei-c.org/SIG/Libraries/teiinlibraries/3.1.0a/

The best starting point is the main prose document:

http://www.tei-c.org/SIG/Libraries/teiinlibraries/3.1.0a/main-driver.html

If you have comments, we invite you to submit issues by [date here] in our GitHub site ( https://github.com/kshawkin/Best-Practices-for-TEI-in-Libraries/issues ).

Thank you in advance for helping make this a better resources for all.

Kevin Hawkins on behalf of the workgroup to revise the Best Practices for TEI in Libraries

kshawkin commented 6 years ago

Syd wrote today that he won't be able to fix the rendering soon, so we agreed that in the meantime I would hand-edit main-driver.html to fix the rendering issues. That's now done. I've asked Ian to enable the viewing of an Apache directory listing for http://www.tei-c.org/SIG/Libraries/teiinlibraries/3.1.0a/ .

Once that's done, I can send this message, plus update http://purl.oclc.org/NET/teiinlibraries to link to this version under development.

kshawkin commented 6 years ago

Ian enabled the viewing of the directory listing yesterday, but I've been busy. Will finally announce this weekend.

kshawkin commented 6 years ago

Message just sent to teilib-l and TEI-L.

kshawkin commented 6 years ago

Once issues 27, 64, 73, and any newly created issues are resolved, I'll start proofreading.

emylonas commented 6 years ago

@sydb pls generate new BPTL guidelines @kshawkin @emylonas start to proofread. Only issue that may change prose is #83.

sydb commented 6 years ago

Done. See usual temporary site.

kshawkin commented 6 years ago

Best Practices for TEI in Libraries_2018-08-23_proofreading.docx I've edited as attached. Will move into the ODD source soon. I feel confident enough about our text that I am no longer interested in drafting in an outside reader.

kshawkin commented 6 years ago

Changes in Word document now implemented everywhere (plus a few more things I discovered along the way). If Syd can regenerate the HTML, that will provide a cleaner starting point for Elli.

sydb commented 6 years ago

Done. See usual temporary site. HOWEVER, for reasons I cannot explain and do not have time to investigate now, it would not build on my desktop at home (giving java.lang.RuntimeException: Internal error evaluating template rule at line 168 in module .../Stylesheets/odds/odd2lite.xsl), but built fine on my desktop at work.

emylonas commented 6 years ago
emylonas commented 6 years ago

Good idea to search for See... because not all cross refs are linked.

kshawkin commented 6 years ago

Bullet 1: I removed these in my last round of revisions. Bullet 4: I'm not sure what to link to. P5's Gentle Introduction to XML includes a link to the W3C spec, but I believe that a spec is never the thing to link to for an introduction. Bullet 6: Yes, it appears that line breaks are missing in the "Appearance in source document" table. @sydb , is this something you can fix, or will I just need to hack it in the final HTML output? Bullet 7: I'll remove the stuff about MS-DOS and Apple Filing Protocol, but I'm reluctant to link to that PDF because digitalpreservation.ncdcr.gov appears to be a one-time grant-funded effort, so I'm afraid the link will break before the guidelines are updated.

I'll search for "see" everywhere to standardize our cross-references and insert links where missing.

emylonas commented 6 years ago

3.7 minor point: \the number of the page whose text follows..the wordtextmight more clearly becontentbut not major issue 3.9.1 on@type` - discussion of Roma - it's changing, we should remember that. But perhaps there will be a redirect in place anyway.

3.9.3 Since Linked Data applications in libraries make many authority records being made accessible through URIs looks like a scribal error!! probably need to eliminate the words "being made"

  1. We recommend <sourceDoc> in level 1. What else could there be? Perhaps only a facsimile set of images. But that's also recommended. Can there be a level 1 without a sourceDoc?

4.1.5 on FRBR - the links in FRBRoohttp://www.cidoc-crm.org/frbroo/ and BIBFRAMEhttps://www.loc.gov/bibframe/, are run on with the text before them.

4.1.5 author element in teiHeader - should we allude to the section on persName and so on below to show how the name might be linked to an authority?

4.1.5 <ptr target="___"> in biblStruct - says provide a URI for the object of encoding that is part of a larger work. This is a little unclear. Do you mean when an the object of encodung is a part... The next sentence is also not completely clear.

encodingDecl - quotation element - are we recommending the use of the @marks attribute or just the prose paragraph. Or either? in which case this is fine as written

<rendition selector="___" scheme="css"> The last sentence of the explanation which also allows the use of @xml:id is in parentheses - seems oddly subordinate.

[I'll go back and keep reading the 4 levels later today]

section 5: seems fine seciton 6: do you want to add a sentence or two on the end of it to say that version 4 was released ... and so on? you can list the group as for the earlier releases and bring it to a conclusion.

Another question - are the earlier versions available for historic reasons? referenced anywhere? More proofreading to come

kshawkin commented 6 years ago

Thanks for catching so many things in your proofreading! I've made all of these changes except for the discussion of sourceDoc at the top of section 4. Here, as elsewhere in the BPTL, we use "recommended" for things that are required according to the BPTL but not P5 (see #52), meaning that according to the BPTL, Level 1 must have a sourceDoc. Perhaps this idiosyncratic use of modal expressions was a mistake, but it's too late to change for version 4.0.0. :(

I've updated the appendix (section 7) to discuss the history. It includes a link or two that doesn't work now but will once I do #85.

emylonas commented 6 years ago

Ok, that is part of BPTL - I was feeling that it should be "must" not just "recommended" but we should definitely stick to our agreed upon practice.

emylonas commented 6 years ago

Last set of comments:

Level 1

4.2.2.3 Rationale

stand alone as an electronic text (without page images) stylistic, so not necessary, but a bit clearer stand alone as an electronic text without accompanying page images

table of elements, <surface> or most likely a textual better: or, most likely, a textual

Level 2

should we make it explicit that <div>s can be nested as needed in Level 2? (in the table of elements)

4.2.3.1 Rationale

However, it is unknown whether or not it is truly ‘TEI conformant’, as P5 does not make clear whether or not encoding of individual paragraphs is mandatory. is it still the case that we aren't sure if this level is TEI conformant because paragraphs not all marked individually? Can we be clearer one way or another?

Level 3 -

the Rationale is really nicely written!

4.2.4.4. Workflow -

do we want to mention that some human intervention may be necessary.

4.2.4.5 Element recommendations: substantive comment, sorry to see it so late. Why not allow <ab> ? sometimes there are paragraph-like things that aren't paragraphs, for ex. on a title page. Have we excluded it in the level 3 schema?

editing: the line Use all elements specified in Level 2 except <ab>, plus the following: is italicized from "except" to the end of the line. I think we may only intend the phrase except <ab> to be italicized. And I have just questioned that...

table of elements - on <div> and <div1> better to remove the plus sign. I don't think you need any conjunction there.

Is there anything to say about <lb/> on level 3?

Level 4

4.2.5.3

a searcher could limit his or her perhaps a searcher could limit their

4.2.5.5

same remark about <ab>

table of elements: we repeat <hi> because it is applied in a slightly different way?

4.2.5.6.2 Name tagging

paragraph 2 implies that the @Name elements should only link to external files, but then the 3rd paragraph refers to the classDecl in the header.

4.2.5.6.4 Drama

<lb/> is at the end of the line. It's at the beginning in earlier examples Level 3, for ex. Do we want to model good behaviour? do we care? If so, should check across the document

4.2.5.6.5 Oral History

paragraph 2 If a oral history "an oral history" suggestion: paragraph 1 The list of participants in an oral history should we say The list of participants in the transcription of an oral history or title an be "transcription of oral history"

4.2.5.6.7. Level 4 Typographic Separators

and here's our friend <ab>...

Alger Hiss example - the <lb/>s are at the beginning of the line here as they should be.

Level 5

4.2.6.4 Element Recommendations

Please refer to the TEI Header section above should be a link?

I noticed that the Specification section at the end of each level is empty and has no links. Except for Is that because of the way we are generating the files and it will fall into place when we make the real thing?

kshawkin commented 6 years ago

Lots more great edits. I've fixed things except where noted below:

Level 2:

Levels 3 and 4: I've clarified that we meant to say that you should not use <ab> as defined in Level 2. See new issue #89 for our successors.

Level 3: Nothing to say about <lb/> since this is already documented in Level 2.

Level 4:

kshawkin commented 6 years ago

Oh, and on the empty "specification" section: Yes, I believe this is an artifact of creation of the HTML. I'll edit these out when creating the cleaned-up version to post.

emylonas commented 6 years ago

All sounds good. There is a lot to say about how to store controlled vocabularies especially in documents in progress vs. archival forms. Not necessary to deal with it here.