papyri / idp.data

Data from the Integrating Digital Papyrology project
62 stars 36 forks source link

Outdated HGV ID's #83

Open Frederic-P opened 5 years ago

Frederic-P commented 5 years ago

Not entirely sure if my feeling is correct, but it seems to me there are a handful of files containing outdated HGV-id's I've not committed any changes to the .XML files as I'm not sure if this is done intentionally or not.

row[0] = HGV ID parsed from the xml file ==> file["TEI"]["teiHeader"]["fileDesc"]["publicationsStmt"]["idno type="HGV""] ## Should return the name(s) of the file(s) that contain(s) the HGV data. row[1] = The file that contains this (outdated?) reference.

These are the [HGVid, apis.xml-file] combinations that I think are outdated: [8675, 'idp.data-master/APIS/berkeley/xml/berkeley.apis.1408.xml'] [8675, 'idp.data-master/APIS/berkeley/xml/berkeley.apis.1409.xml'] [89510, 'idp.data-master/APIS/britmus/xml/britmus.apis.59202.xml'] [20114, 'idp.data-master/APIS/lund/xml/lund.apis.108.xml'] [24986, 'idp.data-master/APIS/oxford-ipap/xml/oxford-ipap.apis.1465.xml'] [65003, 'idp.data-master/APIS/oxford-ipap/xml/oxford-ipap.apis.216.xml'] [65006, 'idp.data-master/APIS/oxford-ipap/xml/oxford-ipap.apis.260.xml']

There's no data for them on the HGV website either: e.g. http://aquila.zaw.uni-heidelberg.de/hgv/8675, http://aquila.zaw.uni-heidelberg.de/hgv/89510 ...

jcowey commented 5 years ago

Thank you for the list. It would be super helpful if you would suggest what the HGV_id should be.

The list is small and I would certainly like to try to fix it.

Frederic-P commented 5 years ago

Hey, I think these are the changes to be made (based on data from Trismegistos): oldID ==> Motivation ==> New id(s) 8675 ==> Split up according to tm (https://www.trismegistos.org/text/5430) ==> 5430a 5430b 89510 ==> no HGV ID found on (https://www.trismegistos.org/text/89510) or outgoing links ==> ???? 20114 ==> Double 20114 == 63054 (https://www.trismegistos.org/text/63054) ==> 63054 24986 ==> Double 24986 == 58917 (https://www.trismegistos.org/text/58917) ==> 58917 65003 ==> Other text on same object: 18878 (https://www.trismegistos.org/text/18878) ==> 18878 65006 ==> Reused blank space in 36694 (https://www.trismegistos.org/text/36694) ==> 36694

since HGV_ID 8675 is referenced in two .xml files and has two new HGV-id's I think that it's safe to say that there's a one-to-one relation between the .xml files and the new HGV-ids, Someone who's more familiar with these documents might be able to confirm?

jcowey commented 5 years ago

Many thanks for the extra info. I will probably not get round to looking at these in the next couple of days, but will certainly look at them next week.