gutenbergtools / libgutenberg

Common files used by Project Gutenberg python projects.
GNU General Public License v3.0
5 stars 3 forks source link

Move "Updated" to a new MARC field #36

Closed gbnewby closed 8 months ago

gbnewby commented 8 months ago

This was discussed by email around a year ago and I thought it was in progress, but didn't find an issue in the gutenbergtools.

Currently, field 508 is used for Credits and for Updated lines.

We'd like to just use it for Credits. Updated lines should go to a new field (we discussed a few possibilities and I don't recall whether there was a favorite).

With recent observations/focus by Distributed Proofreaders that some credit lines are missing or mis-represented in older & newer books, I wanted to resurface this topic.

Roger and I can easily identify and relocate Updated lines to another field. We just need to know what field, and then we can implement the Postgres statements (on dev, then prod).

eshellman commented 8 months ago

New "Updated" info is now taken from the last modified date of the file. We decided not to put updated dates in the credit field or in the database; and these date are already in the database as part of file metadata If you want to remove old updated info, it needs to be done by hand. I can supply spreadsheets for this if desired.

gbnewby commented 8 months ago

Are you saying that all the instances of Update dates can be removed from field 508?

I already know how to do that. I've been waiting for you to apply the move to get those data into another field. It sounds like you changed the implementation, but I was not aware of this.

On Tue, Jan 2, 2024 at 2:21 PM Eric Hellman @.***> wrote:

New "Updated" info is now taken from the last modified date of the file. We decided not to put updated dates in the credit field or in the database; and these date are already in the database as part of file metadata If you want to remove old updated info, it needs to be done by hand. I can supply spreadsheets for this if desired.

— Reply to this email directly, view it on GitHub https://github.com/gutenbergtools/libgutenberg/issues/36#issuecomment-1874582826, or unsubscribe https://github.com/notifications/unsubscribe-auth/AFQRDLRVPMZBHIOEST4YKW3YMR26VAVCNFSM6AAAAABBKPKARWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQNZUGU4DEOBSGY . You are receiving this because you authored the thread.Message ID: @.***>

eshellman commented 8 months ago

Conclusion of discussion from a year ago:

On Jan 30, 2023, at 11:51 AM, Eric Hellman @.***> wrote:

On Jan 30, 2023, at 12:42 AM, Greg Newby @. @.>> wrote:

Release date: January 30, 2021 [eBook #63009] [Most recently updated: January 30, 2023]

Making explicit the embodied assumptions - I'm assuming we're all comfortable with:

Square brackets as syntactical sugar with no meaning.

"eBook #63009" is a bit misleading phrasing because of the 3000 or so non-ebooks with numbers

camel case for "eBook"

We're ok with changing the update date for trivial file changes

No line spacing between metadata items in text, maybe a bit more in html

Nothing to distinguish legacy square bracketed text

no storage of update metadata in db other than file mod date and perhaps a note in the credit field

In previous discussions:

Title: The title: possibly with a colon immediately after a title word Subtitle: Words on a single line with previously multiline subtitles delimited by " : "

(Library style punctuation) Original Publication: Place: Publisher, Year

backfile presentation of credits as currently rendered, trying to include credits not yet in the db

Credits: Roger Frank and Sue Clark.

Credits: Produced by Stephen Hutcheson and the Online Distributed Proofreading Team at https://www.pgdp.net https://www.pgdp.net/

eshellman commented 8 months ago

The credit metadata should contain "Update date" when it's part of an addendum to the credits, or when for some reason the date of the update needs to be kept even after further updates. The mod date of the most-recently updated source file is appended in the generated header.

There is already modification of the credit date that may do some of what you "know how to do". https://github.com/gutenbergtools/libgutenberg/blob/1ef1dded6223756096dec7846261e82b991ea006/libgutenberg/DublinCoreMapping.py#L142 libgutenberg/libgutenberg/DublinCoreMapping.py at 1ef1dded6223756096dec7846261e82b991ea006 · gutenbergtools/libgutenberg github.com

On Jan 2, 2024, at 4:40 PM, Greg Newby @.***> wrote:

Are you saying that all the instances of Update dates can be removed from field 508?

I already know how to do that. I've been waiting for you to apply the move to get those data into another field. It sounds like you changed the implementation, but I was not aware of this.

On Tue, Jan 2, 2024 at 2:21 PM Eric Hellman @.***> wrote:

New "Updated" info is now taken from the last modified date of the file. We decided not to put updated dates in the credit field or in the database; and these date are already in the database as part of file metadata If you want to remove old updated info, it needs to be done by hand. I can supply spreadsheets for this if desired.

— Reply to this email directly, view it on GitHub https://github.com/gutenbergtools/libgutenberg/issues/36#issuecomment-1874582826, or unsubscribe https://github.com/notifications/unsubscribe-auth/AFQRDLRVPMZBHIOEST4YKW3YMR26VAVCNFSM6AAAAABBKPKARWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQNZUGU4DEOBSGY . You are receiving this because you authored the thread.Message ID: @.***>

— Reply to this email directly, view it on GitHub https://github.com/gutenbergtools/libgutenberg/issues/36#issuecomment-1874600185, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAHCGMNBBEH6HKVT275KNNLYMR5E3AVCNFSM6AAAAABBKPKARWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQNZUGYYDAMJYGU. You are receiving this because you modified the open/close state.

gbnewby commented 8 months ago

Thanks for this. Unfortunately I am not seeing what you are describing.

Here is an example: https://www.gutenberg.org/ebooks/10000

You can see in the bibrec section, "Updated: 2021-12-20."

The label for this on the landing page is Credits.

I can see in the database via the catalog editor that "Updated: 2021-12-20" is in the 508 field.

So either we are talking about something different, or what you described only applies to some of the collection.

To reiterate what I would like: To not use 508 for Updates, and only to use 508 for actual credits.

I don't have a strong preference for whether updates are in the database as a separate field, or are calculated based on file timestamps. However, there should be an indication of update dates in the XML/RDF. Currently, there is but it's in 508:

<pgterms:marc508>Updated: 2021-12-20</pgterms:marc508>

If I were to remove that entry from the 508 field for #10,000, where/how would the update date be presented in the landing page? Would it present exactly as it already does in the generated files?

I'll be pleased to have a teleconference to discuss if this might be helpful.

On Tue, Jan 2, 2024 at 3:40 PM Eric Hellman @.***> wrote:

The credit metadata should contain "Update date" when it's part of an addendum to the credits, or when for some reason the date of the update needs to be kept even after further updates. The mod date of the most-recently updated source file is appended in the generated header.

There is already modification of the credit date that may do some of what you "know how to do".

https://github.com/gutenbergtools/libgutenberg/blob/1ef1dded6223756096dec7846261e82b991ea006/libgutenberg/DublinCoreMapping.py#L142

libgutenberg/libgutenberg/DublinCoreMapping.py at 1ef1dded6223756096dec7846261e82b991ea006 · gutenbergtools/libgutenberg github.com

On Jan 2, 2024, at 4:40 PM, Greg Newby @.***> wrote:

Are you saying that all the instances of Update dates can be removed from field 508?

I already know how to do that. I've been waiting for you to apply the move to get those data into another field. It sounds like you changed the implementation, but I was not aware of this.

On Tue, Jan 2, 2024 at 2:21 PM Eric Hellman @.***> wrote:

New "Updated" info is now taken from the last modified date of the file. We decided not to put updated dates in the credit field or in the database; and these date are already in the database as part of file metadata If you want to remove old updated info, it needs to be done by hand. I can supply spreadsheets for this if desired.

— Reply to this email directly, view it on GitHub < https://github.com/gutenbergtools/libgutenberg/issues/36#issuecomment-1874582826>,

or unsubscribe < https://github.com/notifications/unsubscribe-auth/AFQRDLRVPMZBHIOEST4YKW3YMR26VAVCNFSM6AAAAABBKPKARWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQNZUGU4DEOBSGY>

. You are receiving this because you authored the thread.Message ID: @.***>

— Reply to this email directly, view it on GitHub < https://github.com/gutenbergtools/libgutenberg/issues/36#issuecomment-1874600185>, or unsubscribe < https://github.com/notifications/unsubscribe-auth/AAHCGMNBBEH6HKVT275KNNLYMR5E3AVCNFSM6AAAAABBKPKARWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQNZUGYYDAMJYGU>.

You are receiving this because you modified the open/close state.

— Reply to this email directly, view it on GitHub https://github.com/gutenbergtools/libgutenberg/issues/36#issuecomment-1874650663, or unsubscribe https://github.com/notifications/unsubscribe-auth/AFQRDLWJDZNFIC2PDKWTG33YMSEGDAVCNFSM6AAAAABBKPKARWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQNZUGY2TANRWGM . You are receiving this because you authored the thread.Message ID: @.***>

eshellman commented 8 months ago

it would be easy to make the landing page look exactly like the generated header, for #10000: image

Code for that is already written.

Adding an update field, by contrast, would create complexity and people would be complaining for a year. I strongly advise against it, as I we discussed a year ago. The things we agreed on last January were implemented and refined in ebookmaker 0.12.29 (february) through 0.12.32 (July). In particular we agreed (albeit with a lot of back and forth) NOT to add an update field to the database.

Shall we move this item to autocat3, Addressing it with a landing page display more aligned with ebookmaker?

gbnewby commented 8 months ago

Discussion moved here: https://github.com/gutenbergtools/autocat3/issues/116