teusbenschop / ndebele

The text of the Ndebele Bible for use by the translation team
3 stars 2 forks source link

Cross-references and the name of deity marker #4

Closed DavidHaslam closed 7 years ago

DavidHaslam commented 7 years ago

In the whole work, there are 653 instances of the patterm \x*\nd* where the cross-reference element is within the name of deity marker.

Any cross-reference element should come after the \nd* marker.

cf. There are 50 instances of the correct pattern \nd*\x

Examples:

Correct syntax: iN\nd kosi\nd*\x + 3.17.\x*. Incorrect syntax: eN\nd kosi\x + 19.19. Eks. 33.12,13,16,17. Luka 1.30. Seb. 7.46.\x*\nd*.

teusbenschop commented 7 years ago

Please pass the script for fixing this automatically :)

teusbenschop commented 7 years ago

Thanks for finding this.

DavidHaslam commented 7 years ago

My observation was made on a concatenated USFM file containing the data for all 66 books.

The corrections should be feasible using a suitable regex search and replace operation.

If I were using TextPipe (in Windows), I'd use the following PCRE replace filter:

Perl pattern [(\\x .+\\x\*)(\\nd\*)] with [$2$$1]
   [X] Match case
   [ ] Whole words only
   [ ] Case sensitive replace
   [ ] Prompt on replace
   [ ] Skip prompt if identical
   [ ] First only
   [ ] Extract matches
       Maximum text buffer size 4096
   [ ] Maximum match (greedy)
   [ ] Allow comments
   [ ] '.' matches newline
   [X] UTF-8 Support
   [ ] Process longest strings first
   [ ] Simultaneous search
   [ ] Log summary only

The search pattern should be non-greedy, as there may be several verses where the pattern occurs more than once.

NB. The replace filter also has to be restricted to within the PCRE pattern \\nd .+\\nd\*

The above TextPipe filter has been written and tested successfully on my merged.usfm file.

DavidHaslam commented 7 years ago

I've not forked the repo so far because I was primarily thinking of analysis tasks rather than making changes.

I'm simply aiming to anticipate issues that would need to be addressed before updating the SWORD module in CrossWire Main. cf. The module Ndebele was made almost 8 years ago.

SwordVersionDate=2009-11-01
TextSource=http://sites.google.com/site/bibletranslationdata/
teusbenschop commented 7 years ago

If you'd like to fork it, at some time in the future, make the changes required, and there would be a pull request, I can examine it and integrate it in the repo. That would be great!

DavidHaslam commented 7 years ago

Forked and cloned. Currently working on the systematic fix.

teusbenschop commented 7 years ago

Whew!

teusbenschop commented 7 years ago

See https://github.com/teusbenschop/ndebele/pull/5