projectLEMDO / lemdoIssues

Repository for LEMDO issue tracking and related documents.
MIT License
1 stars 0 forks source link

Add Schematron for trailing punctuation in name tags #222

Open martindholmes opened 1 month ago

martindholmes commented 1 month ago

There are lots of cases of e.g. trailing commas included in <persName> tags; Find and fix all these, then add a Schematron rule to prevent them.

martindholmes commented 1 month ago

Refining this a bit, we should catch all cases of names with trailing comma, semicolon, or colon; and we should catch cases where a final period is not preceded by a capital letter (i.e. someone's initials). There are many, many bad instances to catch before we can add the Schematron. Tags such as gloss and term should also be included.

JanelleJenstad commented 1 month ago

Refining the refinement: "we should catch cases where a final period is not preceded by a capital letter (i.e. someone's initials)" -- We will have cases where persName wraps around names like these ones: Boba Fett, Jr. or Boba Fett, Esq. So our rule cannot be looking for capital letters before periods.

martindholmes commented 4 weeks ago

Good point. I've added a Schematron rule to catch commas, but right now it's ignoring <ref> elements because there are too many existing errors to fix. We can refine later; we can probably add a bunch of different punctuation marks, even if we can't frame a rule that catches periods.