srophe / caesarea-data

Data repository for Caesarea-Maritima.org
0 stars 2 forks source link

Re-run the script that adds the string sort attribute #177

Closed wlpotter closed 1 year ago

wlpotter commented 1 year ago

Now that we have new data we need to update so that the records ahve the string sort attribute letting records within the same work appear properly sorted w.r.t. the section ordering.

Add a test to this script so that it only runs on work-groups that have at least one record that lacks the n attribute. This will cut down on time re-processing unchanged work groups and, more importantly the time spent manually checking the ones that couldn't be sorted by the matching algorithm.

wlpotter commented 1 year ago

Hmm I might have uncovered a second problem:

We have several cases of the same work being referred to separately. And since we don't have stable URIs for all authors and works, I have had to rely on string matching to create the author-work groups for sorting. Here are the offenders as of 2023-02-24:

Eusebius of Caesarea. Dictionary of Place Names Eusebius of Caesarea. Ecclesiastical History Eusebius of Caesarea. Historia ecclesiastica Eusebius of Caesarea. Life of Constantine Eusebius of Caesarea. Martyrs of Palestine Eusebius. Dictionary of Place Names Eusebius. Life of Constantine Photius. Library Photius. Library Library Pliny the Elder. Natural History Pliny. Natural History Procopius of Gaza. Letter Procopius of Gaza. Letters Socrates of Constantinople. Church History Socrates Scholasticus. Ecclesiastical History Sozomen. Church History Sozomenus. Church History

We will need some data normalization before we can reliably run an updated string sort. @davidamichelson we should discuss.

The string sort is not essential, and ideally we'd just run it once more before official publication, so this is not a priority.

davidamichelson commented 1 year ago

@wlpotter here are the revisions for making these uniform.

Eusebius of Caesarea. Historia ecclesiastica -> Eusebius of Caesarea. Ecclesiastical History Eusebius. Dictionary of Place Names -> Eusebius of Caesarea. Dictionary of Place Names Eusebius. Life of Constantine -> Eusebius of Caesarea. Life of Constantine Photius. Library Library -> Photius. Library Pliny. Natural History -> Pliny the Elder. Natural History Procopius of Gaza. Letters -> Procopius of Gaza. Letter Socrates Scholasticus. Ecclesiastical History -> Socrates of Constantinople. Church History Sozomenus. Church History -> Sozomen. Church History

After you run these please prepare a two column report of all names and titles.

davidamichelson commented 1 year ago

@wlpotter please make an issue to discuss "Letters vs. Letter" @josephrife on Friday please.

@wlpotter Also one more to discuss, should be add geographic names for other authors when there are multiple with the same name (Procopius. Secret History -> Procopius of Caesarea -> Secret History)

josephrife commented 1 year ago

Thanks. On the second question with with e.g. Procopius of Caesarea. Good catch. An editing error I suspect, we just correct by adding "of Caesarea" to the other Procopius refs that are missing it. I am not sure if there are other examples, I might recall seeing some "Pliny the Elder" but then a bare "Pliny" that needs enhancement too. Possible to make a list of all "authors" and see where there are any to fix?


Joseph L. Rife Associate Professor Department of Classical and Mediterranean Studies Affiliated Faculty in Anthropology and Religion Director, The American Excavations at Kenchreai (Greece) Co-Director, The Caesarea City and Port Exploration Project (Israel)

Check out my research websites: https://caesarea-maritima.org/ http://www.kenchreai.org/

Check out my books: [cid:c21826f5-54ca-451a-9922-6a6163d57000] [cid:7130c953-93e1-4d0e-8e97-dca894f9950e] https://www.ascsa.edu.gr/publications/book/?i=9780876615546


From: David Michelson @.> Sent: Wednesday, March 1, 2023 10:24 AM To: srophe/caesarea-data @.> Cc: Rife, Joseph Lee @.>; Mention @.> Subject: Re: [srophe/caesarea-data] Re-run the script that adds the string sort attribute (Issue #177)

@wlpotterhttps://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fwlpotter&data=05%7C01%7Cjoseph.rife%40vanderbilt.edu%7C517c71df22774330611208db1a717d0f%7Cba5a7f39e3be4ab3b45067fa80faecad%7C0%7C0%7C638132846957236845%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=ya2Hzl%2Bk3fmbDcOMz6GsQ5RMCNHsaGQ2uAEGHOoH4jQ%3D&reserved=0 please make an issue to discuss "Letters vs. Letter" @josephrifehttps://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fjosephrife&data=05%7C01%7Cjoseph.rife%40vanderbilt.edu%7C517c71df22774330611208db1a717d0f%7Cba5a7f39e3be4ab3b45067fa80faecad%7C0%7C0%7C638132846957236845%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=arNbkT0DlOvaLn%2FISerLLHB0k%2FYIeKfCPfhmgywhHVA%3D&reserved=0 on Friday please.

@wlpotterhttps://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fwlpotter&data=05%7C01%7Cjoseph.rife%40vanderbilt.edu%7C517c71df22774330611208db1a717d0f%7Cba5a7f39e3be4ab3b45067fa80faecad%7C0%7C0%7C638132846957236845%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=ya2Hzl%2Bk3fmbDcOMz6GsQ5RMCNHsaGQ2uAEGHOoH4jQ%3D&reserved=0 Also one more to discuss, should be add geographic names for other authors when there are multiple with the same name (Procopius. Secret History -> Procopius of Caesarea -> Secret History)

— Reply to this email directly, view it on GitHubhttps://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fsrophe%2Fcaesarea-data%2Fissues%2F177%23issuecomment-1450435353&data=05%7C01%7Cjoseph.rife%40vanderbilt.edu%7C517c71df22774330611208db1a717d0f%7Cba5a7f39e3be4ab3b45067fa80faecad%7C0%7C0%7C638132846957236845%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=V9bke%2Bfa4cIgnQuPdu7qFXoFxk2xNDIf196R1%2BrKz0s%3D&reserved=0, or unsubscribehttps://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FASMZS2CPO3GC4NIE72QP6BLWZ5Z5JANCNFSM6AAAAAAVHHGGII&data=05%7C01%7Cjoseph.rife%40vanderbilt.edu%7C517c71df22774330611208db1a717d0f%7Cba5a7f39e3be4ab3b45067fa80faecad%7C0%7C0%7C638132846957236845%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=3lk4wLYNbOJWJla7E2Yat%2FEaAPNuRJny9wHan60iWYI%3D&reserved=0. You are receiving this because you were mentioned.Message ID: @.***>

wlpotter commented 1 year ago

I split this into #179 and #180. Once these are resolved, I will run the script to add the string-sort attribute.

wlpotter commented 1 year ago

This should be fixed