Closed MansMeg closed 8 months ago
It seems like most of the protocols don't have this ID
No. Can you add them? I think it is quite easy. Now when you have a test.
Can you use the head element instead?
This is in every protocol already.
Hmm. Thats good. I could use that, but then we should rename head to id. The point is that we need a record id and that it is crystal clear that this is the id for the record.
Maybe should we move head to id, or rename head to id?
head is part of the tei namespace. could be in both places, I guess.
Yes. I think head is not crystal clear for a record id.
Or maybe make it even more clear. Can we ad a record_id under head in the preface?
I think we should use the XML ID attribute of the TEI element for this. See official ParlaClarin example
<TEI xml:id="document.id" xml:lang="en">
<teiHeader>
<fileDesc>
<titleStmt>
<!-- There are no rules on how these titles should be written -->
<title>The parliament of the Republic of Slovenia</title>
<title>Continuation of the second session</title>
<title>30th January 2011</title>
The head element should be a header for the protocol anyway, but I don't think we need to change it now
<text>
<front>
<div type="preface">
<!-- text before speeches started -->
<head>THE PARLIAMENT OF THE REPUBLIC OF SLOVENIA</head>
<head>Continuation of the second session</head>
<docDate when="2011-01-30">30th January 2011</docDate>
</div>
</front>
<body>
I think this make sense. I think it is clear that we should XML ID attribute of the TEI element. Good catch there @ninpnin .
Excellent!
I need to extract the record ids from the files. In some files this exist, as it should, as an XML id object in the TEI node:
<TEI xml:id="prot-1896--ak--42">
But this is not the case in all records. We should add this to all records and also add a unit test to ensure that it exists throughout.