wellcomecollection / wellcome-collection-tei

Manuscript Descriptions encoded according to the Text Encoding Initiative
MIT License
8 stars 10 forks source link

Decide on a consistent scheme for the `xml:id` #53

Open alexwlchan opened 1 year ago

alexwlchan commented 1 year ago

At the top of every TEI file is a <TEI> element with an xml:id, e.g.

<TEI xmlns="http://www.tei-c.org/ns/1.0" xml:id="Hebrew_B_21">

Currently they're a bit of an inconsistent mess:

List of all xml:id values ``` manuscript_15808 manuscript_15653 manuscript_16448 manuscript_16456 manuscript_16464 manuscript_16465 manuscript_16479 manuscript_16499 manuscript_16502 manuscript_16504 manuscript_16505 manuscript_16506 manuscript_16508 manuscript_16509 manuscript_16511 manuscript_16512 manuscript_16513 manuscript_16514 manuscript_16515 manuscript_16519 manuscript_16520 manuscript_16523 manuscript_16525 manuscript_16526 manuscript_16527 manuscript_16528 manuscript_16530 manuscript_16531 manuscript_16534 manuscript_16538 manuscript_15651 manuscript_15660 manuscript_15745 manuscript_15747 manuscript_15752 manuscript_15661 manuscript_15761 manuscript_15762 manuscript_15662 manuscript_15766 manuscript_15663 manuscript_15664 manuscript_15792 manuscript_15793 manuscript_15665 manuscript_15666 manuscript_15806 manuscript_15809 manuscript_15810 manuscript_15811 manuscript_15667 manuscript_15816 manuscript_15819 manuscript_15820 manuscript_15823 manuscript_15824 manuscript_15668 manuscript_15827 manuscript_15828 manuscript_15652 manuscript_15670 manuscript_15846 manuscript_15847 manuscript_15672 manuscript_15876 manuscript_15879 manuscript_15674 manuscript_15890 manuscript_15893 manuscript_15895 manuscript_15897 manuscript_15898 manuscript_15899 manuscript_15900 manuscript_15676 manuscript_15901 manuscript_15908 manuscript_15914 manuscript_15915 manuscript_15918 manuscript_15919 manuscript_15922 manuscript_15924 manuscript_15925 manuscript_15927 manuscript_15930 manuscript_15679 manuscript_15931 manuscript_15932 manuscript_15933 manuscript_15938 manuscript_15939 manuscript_15946 manuscript_15681 manuscript_15954 manuscript_15684 manuscript_15983 manuscript_15990 manuscript_15991 manuscript_15685 manuscript_15994 manuscript_15998 manuscript_16002 manuscript_16004 manuscript_16014 manuscript_16015 manuscript_16016 manuscript_16018 manuscript_16019 manuscript_16020 manuscript_16022 manuscript_16025 manuscript_16026 manuscript_16027 manuscript_16028 manuscript_16040 manuscript_16044 manuscript_16045 manuscript_16046 manuscript_16046 manuscript_16048 manuscript_16049 manuscript_16050 manuscript_16051 manuscript_16052 manuscript_16053 manuscript_16052 manuscript_16055 manuscript_16056 manuscript_16057 manuscript_16058 manuscript_16059 manuscript_16060 manuscript_16061 manuscript_16062 manuscript_16063 manuscript_16064 manuscript_16065 manuscript_16066 manuscript_16067 manuscript_16060 manuscript_16069 manuscript_16070 manuscript_16071 manuscript_16072 manuscript_16073 manuscript_16074 manuscript_16075 manuscript_16076 manuscript_16077 manuscript_16078 manuscript_16079 manuscript_16080 manuscript_16081 manuscript_16082 manuscript_16083 manuscript_16084 manuscript_16085 manuscript_16086 manuscript_16087 manuscript_16088 manuscript_16089 manuscript_16090 manuscript_16091 manuscript_16092 manuscript_16093 manuscript_16094 manuscript_16095 manuscript_16096 manuscript_16097 manuscript_16098 manuscript_16099 manuscript_16100 manuscript_16101 manuscript_16102 manuscript_15695 manuscript_16103 manuscript_16104 manuscript_16105 manuscript_16106 manuscript_16107 manuscript_16108 manuscript_16109 manuscript_16110 manuscript_16111 manuscript_16112 manuscript_16113 manuscript_16114 manuscript_16115 manuscript_16116 manuscript_16117 manuscript_16118 manuscript_16119 manuscript_16120 manuscript_16121 manuscript_16122 manuscript_16123 manuscript_16124 manuscript_16125 manuscript_16126 manuscript_16127 manuscript_16128 manuscript_16129 manuscript_16130 manuscript_16132 manuscript_15698 manuscript_16133 manuscript_16138 manuscript_16140 manuscript_16141 manuscript_16146 manuscript_16149 manuscript_16150 manuscript_16151 manuscript_16154 manuscript_16155 manuscript_16156 manuscript_16158 manuscript_16164 manuscript_16170 manuscript_16171 manuscript_16172 manuscript_16173 manuscript_16174 manuscript_16176 manuscript_16177 manuscript_16200 manuscript_16201 manuscript_16203 manuscript_16204 manuscript_16209 manuscript_16212 manuscript_16216 manuscript_16217 manuscript_16221 manuscript_16222 manuscript_16224 manuscript_16227 manuscript_16229 manuscript_16231 manuscript_16232 manuscript_16236 manuscript_16237 manuscript_16238 manuscript_16241 manuscript_16242 manuscript_15656 manuscript_16250 manuscript_16251 manuscript_16253 manuscript_16261 manuscript_16262 manuscript_15710 manuscript_16266 manuscript_16268 manuscript_16272 manuscript_16274 manuscript_16276 manuscript_16277 manuscript_16281 manuscript_16290 manuscript_15713 manuscript_16297 manuscript_16298 manuscript_16303 manuscript_15714 manuscript_16304 manuscript_16306 manuscript_16308 manuscript_16312 manuscript_16317 manuscript_16319 manuscript_16322 manuscript_16323 manuscript_15716 manuscript_16324 manuscript_16326 manuscript_16333 manuscript_16334 manuscript_16335 manuscript_16337 manuscript_16338 manuscript_16341 manuscript_16342 manuscript_16343 manuscript_16353 manuscript_15719 manuscript_16355 manuscript_16358 manuscript_16360 manuscript_15720 manuscript_16367 manuscript_16368 manuscript_16369 manuscript_16370 manuscript_16372 manuscript_16373 manuscript_15721 manuscript_16374 manuscript_16377 manuscript_16378 manuscript_16379 manuscript_16382 manuscript_16383 manuscript_16386 manuscript_16390 manuscript_16392 manuscript_16415 manuscript_16417 manuscript_16420 manuscript_16421 manuscript_16425 manuscript_16427 manuscript_16432 manuscript_16435 manuscript_16437 manuscript_16441 manuscript_16442 manuscript_15658 manuscript_15727 manuscript_16450 manuscript_16452 manuscript_16453 manuscript_15728 manuscript_16457 manuscript_16459 manuscript_16460 manuscript_16461 manuscript_15729 manuscript_16466 manuscript_16467 manuscript_16470 manuscript_15730 manuscript_16474 manuscript_16475 manuscript_16480 manuscript_16481 manuscript_16483 manuscript_15731 manuscript_16484 manuscript_16485 manuscript_16487 manuscript_16488 manuscript_16491 manuscript_16492 manuscript_16497 manuscript_15744 Batak_330889 Wellcome_Batak_330890 Wellcome_Batak_36801 Batak_36863 Wellcome_Batak_36960 Wellcome_Batak_56303 Wellcome_Batak_56330 Wellcome_Batak_63570 Wellcome_Batak_66485 Wellcome_Batak_66486 Wellcome_Batak_91548 Wellcome_Batak_91624 Batak_330894 Egyptian_MS_1 Egyptian_MS_2 Egyptian_3 Egyptian_4 Egyptian_MS_5 Egyptian_MS_6 Egyptian_MS_7 Egyptian_MS_8 Ethiopian_1 Ethiopian_10 Ethiopian_12 Ethiopian_13 Ethiopian_14 Ethiopian_15 Ethiopian_16 Ethiopian_17 Ethiopian_18 Ethiopian_19 Ethiopian_20 Ethiopian_21 Ethiopian_20 Ethiopian_23 Ethiopian_24 Ethiopian_25 Ethiopian_26 Ethiopian_27 Ethiopian_3 Ethiopian_4 Ethiopian_5 Ethiopian_6 Ethiopian_7 Ethiopian_8 Ethiopian_9 Hebrew_A_1 Hebrew_A_10 Hebrew_A_11 Hebrew_A_12 Hebrew_A_13 Hebrew_A_14 Hebrew_A_15 Hebrew_A_16 Hebrew_A_17 Hebrew_A_18 Hebrew_A_19 Hebrew_A_2 Hebrew_A_20 Hebrew_A_21 Hebrew_A_22 Hebrew_A_23 Hebrew_A_24 Hebrew_A_25 Hebrew_A_26 Hebrew_A_27 Hebrew_A_28 Hebrew_A_29 Hebrew_A_3 Hebrew_A_30 Hebrew_A_31 Hebrew_A_32 Hebrew_A_33 Hebrew_A_34 Hebrew_A_35 Hebrew_A_36 Hebrew_A_4 Hebrew_A_5 Hebrew_A_6 Hebrew_A_7 Hebrew_A_8 Hebrew_A_9 Hebrew_B_1 Hebrew_B_10 Hebrew_B_11 Hebrew_B_12 Hebrew_B_13 Hebrew_B_14 Hebrew_B_15 Hebrew_B_16 Hebrew_B_17 Hebrew_B_18 Hebrew_B_19 Hebrew_B_2 Hebrew_B_20 Hebrew_B_21 Hebrew_B_22 Hebrew_B_23 Hebrew_B_24 Hebrew_B_25 Hebrew_B_26 Hebrew_B_27 Hebrew_B_28 Hebrew_B_29 Hebrew_B_3 Hebrew_B_30 Hebrew_B_31 Hebrew_B_32 Hebrew_B_33 Hebrew_B_34 Hebrew_B_35 Hebrew_B_36 Hebrew_B_37 Hebrew_B_38 Hebrew_B_39 Hebrew_B_4 Hebrew_B_40 Hebrew_B_41 Hebrew_B_42 Hebrew_B_43 Hebrew_B_44 Hebrew_B_45 Hebrew_B_46 Hebrew_B_47 Hebrew_B_48 Hebrew_B_49 Hebrew_B_5 Hebrew_B_50 Hebrew_B_51 Hebrew_B_52 Hebrew_B_53 Hebrew_B_54 Hebrew_B_55 Hebrew_B_56 Hebrew_B_57 Hebrew_B_58 Hebrew_B_5 Hebrew_B_7 Hebrew_B_8 Hebrew_B_9 MS_Japanese_17 MS_Japanese_100 MS_Japanese_116 MS_Japanese_119 MS_Japanese_125 MS_Japanese_16 MS_Japanese_27 MS_Japanese_28 MS_Japanese_29 MS_Japanese_52 MS_Japanese_58 MS_Japanese_59 MS_Japanese_61 MS_Japanese_63 MS_Japanese_79 MS_Japanese_82 MS_Japanese_85 MS_Japanese_86 MS_Japanese_88 MS_Japanese_94 MS_Japanese_98 MS_Japanese_99 Well.Jav.1 Javanese_10 Well.Jav.11 Javanese_2 Wellcome_Jav_3 Wellcome_Javanese_4 Well.Jav.5 Javanese_6 Wellcome_Jav_7 Well.Jav.8 Wellcome_Javanese_9 Karshuni_1 Karshuni_2 Karshuni_3 Wellcome_Malay_1 Wellcome_Malay_10 Wellcome_Malay_2 Wellcome_Malay_3 Wellcome_Malay_4 Wellcome_Malay_5 Wellcome_Malay_6 Wellcome_Malay_7 Wellcome_Malay_8 Wellcome_Malay_9 Syriac_1 Syriac_2 Tamil_1 Tamil_10 Tamil_11 Tamil_12 Tamil_13 Tamil_14 Tamil_15 Tamil 17 Tamil_18 Tamil_19 Tamil_2 Tamil_20 Tamil_21 Tamil 22 Tamil_23 Tamil_24 Tamil_25 Tamil 26 Tamil 27 Tamil_28 Tamil_29 Tamil_3 Tamil_30 Tamil_32 Tamil_33 Tamil_34 Tamil_35 Tamil_36 Tamil_37 Tamil_38 Tamil_4 Tamil_42 Tamil_5 Tamil 6 Tamil 7 Tamil_8 Tamil_9 MS_Tibetan_133 MS_Tibetan_134 ```

Now we've made the display labels consistent, would it be useful to make these identifiers consistent also? e.g. MS_$language_$number. That's something we could add to the XML checker.

It would have caught a recent error – both MS_Hebrew_B_5.xml and MS_Hebrew_B_6.xml had the same xml:id value, which was causing conflicts.

Note: please don't change these IDs without talking to the platform team first. We'll need to do some work on our side to make sure existing URLs to manuscripts don't change when these IDs change. It's possible, we just need to schedule it.

amme2 commented 1 year ago

Yes, I think so, with the usual caveat to check and allow for how this might impacts files which are derived from / shared with external aggregators (i.e. I'm thinking primarily of FIHRIST in this instance).

alexwlchan commented 1 year ago

Branwen has also flagged Fihrist as an issue; so what if we enforced consistency for everything except Fihrist?

(And even there, we can do some validation that, e.g., the same ID hasn't been used twice.)

adrianplau commented 1 year ago

Thank you everybody! This sounds like a great plan, Alex. The only instance of external aggregators is indeed Fihrist (which essentially means the Arabic files plus whatever goes in the Fihrist folder of stuff that is not to go on the front end). Others might come in the future, but, as you say, this is only something we would look into with the platform team's input.