semantic-kraus / dw-data

converts XML/TEI data into CIDOCish RDF
https://semantic-kraus.github.io/dw-data/
MIT License
0 stars 1 forks source link

DW Texts and Quotes: missing values and [n] != [n] #30

Closed BOberreither closed 1 year ago

BOberreither commented 1 year ago

I've got a bigger one for you, sorry.

Missing values in quote_permalinks_2023_04_additional_info.xml

Sometimes, the values in info element are empty. That's because there are two, not one IDs in original source element.

Different counting

It looks to me like when transforming listbibl, the [n] of a text passage was counted differently: When there was a preceding citedRange[@wholeText], this was not included in the count, can that be right? Please confirm. If so,

Modeling in RDF-transformation:

All the info from the info element is used for the object of these triples in INT3: ns1:R12_has_referred_to_entity <https://sk.acdh.oeaw.ac.at/DWbibl00442/passage/19> ;

Thanks!

BergListe commented 1 year ago

I've got a bigger one for you, sorry.

I can't shake the feeling that you want me to lose my mind. 😭

Please fix it so that both are taken into account. All the data for the second ID fill into a new element right after the info-element, let's call it "info2". With the same attributes as info.

I hope it's ok, if there are two <info> elements because it would be rather complex to introduce a second element type.

BergListe commented 1 year ago

It looks to me like when transforming listbibl, the [n] of a text passage was counted differently: When there was a preceding citedRange[@wholeText], this was not included in the count, can that be right? Please confirm. If so,

May I get an example, please?

BergListe commented 1 year ago

All the info from the info element is used for the object of these triples in INT3: ns1:R12_has_referred_to_entity <https://sk.acdh.oeaw.ac.at/DWbibl00442/passage/19> ;

How should the result look?

  1. ns1:R12_has_referred_to_entity <https://sk.acdh.oeaw.ac.at/DWbibl00442/passage/19, https://sk.acdh.oeaw.ac.at/DWbibl00042/passage/1> ; or
  2. ns1:R12_has_referred_to_entity <https://sk.acdh.oeaw.ac.at/DWbibl00442/passage/19>, <https://sk.acdh.oeaw.ac.at/DWbibl00042/passage/1> ;
BOberreither commented 1 year ago

@BergListe

I hope it's ok, if there are two elements because it would be rather complex to introduce a second element type.

Absolutely.

How should the result look?

Definitely like No. 2.

May I get an example, please?

Sure thing. Look at this Text Passage. It is the third citedRange in a bibl; the first citedRange there is a wholeText. It's [n] is 1, so the wholeText was not counted. This quote from DW with the ID DWquote0351 should refer to the citedRange mentioned above. But it's Intertextual Relation points to another Text Passage (to see this, you have to click "Statements" in the upper right corner). This other Text Passage has the URI with [n] 2. So in this case, when converting the quote_permalinks_additional.... xml, you include the wholeText citedRange in the count.

BergListe commented 1 year ago

It looks to me like when transforming listbibl, the [n] of a text passage was counted differently: When there was a preceding citedRange[@wholeText], this was not included in the count, can that be right? Please confirm. If so,

I found the bug! To calculate the three position values I use count(previuos-siblings) hence the first element gets a pos value auf 0 instead of 1. I added +1 to all three values which should hopefully do the trick.

BOberreither commented 1 year ago

@BergListe I somehow doubt this was the bug; because texts.ttl and quotes.ttl should be in sync when it comes to numbering the text passages, but currently, there are many passages ending in 0 in texts.ttl and zero passages ending in 0 in quotes.ttl.

An example

From quote_permalinks, take DWquote0101: It refers to the first citedRange in a bibl in listBibl; in quotes.ttl, this is the URI built for the text passage: <https://sk.acdh.oeaw.ac.at/DWbibl00146/passage/1> In texts.ttl, there is a text passage of this URI, but it's the one built from the second citedRange in the bibl in listBibl. The first one there in texts.ttl has the URI <https://sk.acdh.oeaw.ac.at/DWbibl00146/passage/0>.

So my guess is that the mechanism in general worked fine before, but only had a small hickup when it came to counting citedRanges if one of them was a wholeText.

BergListe commented 1 year ago

I just checked this case and you are totally right there is a bug within the counting procedure! citedRange[@wholeText or @wholePeriodical] weren't included in the counting algorythm. I fixed this behaviour und will upload the result to our Teams channel.

BOberreither commented 1 year ago

@BergListe Looks much better now. I think though there is a little bug left. For this bibl: DWbibl01911 there should not be a passage 0 - because that's the citedRange containing the wholeText, DWbibl00294. Yet there are 2 occurrences of a passage DWbibl00294/passage/0

BergListe commented 1 year ago

Thanks to your intensive research I could eliminate the remaining problems with the counting of passages & relations.