sasansom / sedes

Metrical position in Greek hexameter.
9 stars 3 forks source link

Use book_n=NA for works without separate book numbers #82

Open whoopsedesy opened 1 year ago

whoopsedesy commented 1 year ago

The book_n column of tei2csv output comes from the @n attribute of div1 elements. In works that don't have separate books, @n in the TEI has some kind of placeholder rather than a real number:

The essentially useless book number is copied to the output:

work,book_n,line_n,word_n,word,lemma,sedes,metrical_shape,scanned,num_scansions,line_text
Sh.,Sh.,1,1,ἢ,ἤ,1,–,auto,1,ἢ οἵη προλιποῦσα δόμους καὶ πατρίδα γαῖαν
Sh.,Sh.,1,2,οἵη,οἷος,2,––,auto,1,ἢ οἵη προλιποῦσα δόμους καὶ πατρίδα γαῖαν

It would be better to output a blank in the book_n column in these cases, clearly indicating a work that has line numbers but not book numbers.

I plan to try the heuristic of attempting to parse @n as an integer; then if that does not work, set book_n = None. Raise an error if div1 @n attributes are not unique within a work (counting None as a distinct value).

whoopsedesy commented 1 year ago

The works that need this treatment are Sh., Theog., W.D., Phaen.. Cf. https://github.com/whoopsedesy/breaking-hermanns-bridge/commit/ae8517a93230ea38a32b02d77450d8b8875b579b. Phaen. uses book_n = 1.