projectLEMDO / lemdoIssues

Repository for LEMDO issue tracking and related documents.
MIT License
1 stars 0 forks source link

Standardize reporting of signatures in `@n` value of `<pb>` #146

Closed martindholmes closed 1 year ago

martindholmes commented 1 year ago

We've been having trouble processing the values of signatures and folio numbers appearing in pb/@n because they're not standard; some use lower-case letters, some have leading stuff which is not part of the sig (see issue #145), and others use multiple letters (Aa instead of 2A). We need to standardize this, document it, and enforce it with Schematron.

JanelleJenstad commented 1 year ago

We can't standardize everything:

On the other hand, we should be standardizing on 2A instead of Aa. I'm not sure that you can write a rule to enforce this standard. Fredson Bowers' Principles of Bibliographic Description devotes over 100 pages to the reporting of signature sequences.

My suggestion is that we merely insist that the value of @n on <pb/> be unique. Fredson Bowers would approve of this rule, since his 100+ pages are entirely about uniqueness.

JanelleJenstad commented 1 year ago

Can you write me an XPath to find all instances of signatures that do not begin with a number (e.g., Aa instead of 2A)? I can correct all of them quite quickly. Then I think a Diagnostic rather than Schematron would be helpful.

LEMDO-PM commented 1 year ago

@JanelleJenstad I think you can do that with a regex: <pb n="\D\D That should find all <pb> elements with an @n value beginning with at least two non-digit characters (I get 16 results).

martindholmes commented 1 year ago

OK, this task should then be to enforce uniqueness in pb/@n values within the document using Schematron. Whatever we decide to do with issue #144 should wait on implementation of this.

martindholmes commented 1 year ago

Note that we already have Schematron for enforcing uniqueness of values in pb/@n, but it's constrained to a specific regex that is designed to find only signatures. This can easily be broadened. We first need to fix some odd instances in pb/@n, though.

JanelleJenstad commented 1 year ago

@JanelleJenstad I think you can do that with a regex: <pb n="\D\D That should find all <pb> elements with an @n value beginning with at least two non-digit characters (I get 16 results).

I found 360+ instances in 16 files. Today, I've reduced it to 167 instances in 11 files.

Note that the Regex threw up some Roman numeral @n values on <pb>: ii, ix, vii, etc. I will have to work out what the signatures ought to be. The editor can capture the printed Roman numberals using the <fw> element.

martindholmes commented 1 year ago

@JanelleJenstad Where are we on this? Should I do something with Schematron at this point, and if so, what? All <pb>/@n values are apparently unique already throughout the collection, so uniqueness doesn't seem to be a problem right now.

JanelleJenstad commented 1 year ago

I have fixed every n value on <pb> that needs fixing. We cannot standardize any more than I already have. Please implement this ticket after your vacation so that the @n value on <pb> displays on the left side of the line that contains the words "Page beginning"

martindholmes commented 1 year ago

@JanelleJenstad That's actually very tricky to do; the "Page beginning" text is a pseudo-element in CSS which is absolutely positioned to the right. In rev 15097 I've added the sig number to that text so it appears right with the page-break. I don't think we should inject it in any more obvious way because it's not part of the page content, it's metadata; we don't want people to confuse metadata with forme-works.

JanelleJenstad commented 1 year ago

That's a really good point about it being metadata. I agree that it shouldn't be too obvious. Thanks!

JanelleJenstad commented 1 year ago

This is working beautifully. Thank you! Closing ticket.