teusbenschop / ndebele

The text of the Ndebele Bible for use by the translation team
3 stars 2 forks source link

More xrefs with malformed or nonlocalized book abbreviations #25

Closed DavidHaslam closed 7 years ago

DavidHaslam commented 7 years ago

The following book abbreviations do not match those specified by the 66 \toc3 markers:

1 Cor
1 Kho
1 King.
1 Pet
1 Tes.
1 Tim
1. Pet.
1. Sam.
2Chr
2 Chr.
2 Joha
2 Kho
2 Kin
2 Pet
2 Tes.
2 Tim
2. Tim.
Deut
Dute
Eks
Ex
Exo.
Ezek
Ezek.
Ezr.
Fil .
Filim
Gal
Gen
Hagg.
Heb
Hez
Hos
I sai.
Is.
Isai
Isai.
Isai;
Jak
Jas.
Jer
J obe
Job
Joel
Joha
L evi
Lev
Lev.
Matt
Mic
Mic.
Micah
Mik.
Nahum
Num
Num.
Obad
Pro.
Prov
Ps
Ps.
Rom
Rute
Seb
Song.
Tit
Titus
Tshu .

Notes:

  1. I have retained Hos and Hagg. even though some of those locations were reported before.
  2. Some localized abbreviations are missing the full-stop.
  3. Some abbreviations are still non-localized.
  4. Some are malformed in other ways as well.
  5. Several of the items occur in multiple locations. There are about 218 locations in total.
DavidHaslam commented 7 years ago

The attached tab delimited text file is a tentative replacement list.

replace_bad_abbreviations.tab.txt

Notes:

  1. Replacements should be restricted to within xref elements.
  2. There is a trailing space in each item to tighten the requirements.
  3. Some trivial pre-processing fixes are also required.

I have designed and tested a bespoke TextPipe filter to perform the fixes.

teusbenschop commented 7 years ago

Whew, quite a list of abbreviations that don't match with the \toc3 markers. I have found a few of them occasionally but had not through there were such a big list of them. It is also great that you already have a filter to perform the fix. If you won't mind to clone it and fix it, I can then apply the pull request. Thanks!

DavidHaslam commented 7 years ago

Please commit your recent changes first, before I refork and clone the repo. Thanks.

DavidHaslam commented 7 years ago

I see that you already did so 2 days ago. I hadn't checked since earlier on Friday.

teusbenschop commented 7 years ago

Yup, that correct. There's no uncommitted changes at present.

DavidHaslam commented 7 years ago

It's become more complicated than I outlined yesterday. As well as making more corrections, I'm currently developing the filter further to normalise as much of the xref punctuation as might be possible.

However, I've already come across many instances where reference text is impossible to parse readily.

Here's an example to illustrate the problem: Exodus 28:38 reads: \v 38 Izakuba sebunzini likaAroni, ukuze uAroni athwale ububi bezinto ezingcwele\x + Levi 10.17. Nani 18.1. Levi 22.9. Isa. 53.11. Hez. 4.4,5,6. Joha. 1.29. Heb. 9.28. 1 Pet. 2.24.28.43.\x*, abantwana bakoIsrayeli abazazingcwelisa, kuzo zonke izipho zabo ezingcwele; njalo izahlala isebunzini lakhe, ukuze bemukeleke phambi kweN\nd kosi\nd*\x + Levi 1.4; 22.27; 23.11; Isa. 56.7.\x*.

The problem reference text is 1 Pet. 2.24.28.43.

Now 1 Peter 2 contains only 25 verses, so what to make of the .28.43. ? Clearly this cannot be part of 1 Peter, as the book has only 5 chapters. This means that it would not make sense to change it to 1 Pet. 2.24; 28.43. The only possible solution is to assume that the 28.43. is a reference to Exodus 28:43. Yet that prompts the question, "How to make that clear?".

The reference system currently employed lacks these generalised abbreviations:

Here I'm using English words for chapter and verse, but these would then need to be localized. The reference 1 Pet. 2.24; ch. 28.43. would then be feasible to parse.

There are so many references that are just as difficult, that I fear the fixes cannot be automated.

teusbenschop commented 7 years ago

The example given is proof that automatic fixing wont' be feasable. As far as I remember, the way to indicated that it's Exodus 28:43 is this: 1 Pet. 2.24. 28.43. The full stop indicates the end of the 1 Pet. series. So any next xref entered will fall back to the current book.

DavidHaslam commented 7 years ago

I will refrain attempting to fix anything more than some elementary punctuation discrepancies.

Those that require assessment of multiples of dot separated numbers are too difficult to address systematically, even though I think some of the dots must be typographical mistakes.

Refer to issue #35 for further details.

DavidHaslam commented 7 years ago

I see that your recent commit was for one non-localized abbreviation that I'd noted. Psalm 56:8 had an xref with2 Kin 20.5.

teusbenschop commented 7 years ago

Yup, I had done a bit of work, then thought that it's better to focus on Shona to complete that first.

DavidHaslam commented 7 years ago

The problem with your rule for the interpretation of the full-stop is that it may not work everywhere.

Consider this example: \x + 2 Lan. 29.34. Hlab. 11.2. 32.11. 36.10. 64.10. 94.15. 97.11. Hlab. 125.4. Hlab. 37.14.\x*

On your theory, the full-stop that terminates Hlab. 11.2. ought to indicate the end of the Hlab. series. It's fairly obvious that it doesn't do so necessarily.

The context for this example being Psalm 7:10, it's simply fortuitous that following your rule here makes no actual difference. i.e. that 32.11. 36.10. 64.10. 94.15. 97.11. falls back to the current book which happens also to be Hlab..

We need a counter example to illustrate where the rule doesn't work at all.

teusbenschop commented 7 years ago

Yes, I agree. Sharp observation.

DavidHaslam commented 7 years ago

All the aberrant abbreviations were replaced yesterday in the Editing branch of my fork. My bespoke TextPipe filter also corrected a number of other miscellaneous issues in various xrefs.