Closed DavidHaslam closed 4 years ago
I see now, the low quality of this source (very weird since they are the official publisher of the text) strikes back. I think I need to completely revise this activity: this source has section titles and references, but as we can see the text quality is really low. Other sources are better text quality, but have no section titles.
Can you re-run diatheke after the module rebuild? Hopefully it will look better this time.
@krisek
The attached Zip file contains the updated analyses for a module built from the latest XML file.
I have also included the log output of osis2mod
and the output of emptyvss
.
NB. I have not yet repeated any comparisons.
Thanks a lot, it immediately ponted out a pretty huge bug: last verse in every chapter was missing. I fixed it. What are the commands you run? osis2mod is clear, but I could do the rest too, so that you don't have to re-run for every update.
Characters unexpected include:
U+0060 ` 4 GRAVE ACCENT
U+2022 • 3 BULLET
The former are in these verses:
Isaiah 28:23: Vegyétek füleitekbe és halljátok szavam`, figyeljetek és hallgassátok beszédem`!
Isaiah 32:9: Ti gondtalan asszonyok, keljetek fel, halljátok szavam`, és ti elbizakodott leányzók, vegyétek füleitekbe beszédem`!
The latter are in these verses:
I Kings 12:21: • És mikor megérkezett Roboám Jeruzsálembe, összegyűjté Júda egész házát és Benjámin nemzetségét, száznyolczvanezer válogatott hadra való férfiút, hogy hadakozzanak az
Matthew 5:48: • Legyetek azért ti tökéletesek, miként a ti mennyei Atyátok tökéletes.
James 4:8: • Közeledjetek az Istenhez, és közeledni fog hozzátok. Tisztítsátok meg kezeiteket, ti bűnösök, és szenteljétek meg szíveiteket ti kétszívűek.
@krisek Please review these locations.
The counts of left and right parenthesis do not match:
U+0028 ( 220 LEFT PARENTHESIS
U+0029 ) 219 RIGHT PARENTHESIS
The unmatched locations need to be tracked down.
The counts of double quotation marks do not match:
U+201D ” 12 RIGHT DOUBLE QUOTATION MARK
U+201E „ 13 DOUBLE LOW-9 QUOTATION MARK
The unmatched locations need to be tracked down.
Word frequency anomalies:
1 Ben-Hadad
1 Ben-Hadád
One is without the acute accent.
1 Benjamin
130 Benjámin
Ditto!
1 Beszéljetek
4 Beszéljétek
Ditto!
There are probably many more examples.
There are 36 words that end with a hyphen/minus.
1 Búza-
2 alsó-
6 arany-
1 atya-
3 be-
1 dob-
2 egy-
3 ezüst-
3 fa-
1 faolaj-
1 fel-
1 fige-
1 fiú-
1 gyapjú-
5 jobb-
1 jog-
5 ki-
2 kő-
1 mogyoró-
1 méreg-
1 nyár-
1 nőstény-
1 paizs-
3 réz-
1 szőlő-
1 szőlőtő-
1 trombita-
2 tulok-
1 tölgy-
1 tűz-
4 vas-
1 véres-
1 árpa-
7 égő-
2 ércz-
9 étel-
@krisek Check each location for possible missing spaces.
The Sword utilities come bundled with Xiphos.
My usual procedure is to run the following Windows CMD file called ExportMod.cmd
from a subdirectory.
@echo off
rem Analyse a SWORD module
..\xiphos\diatheke -b %1 -f plain -k "Gen-Rev" >..\Export\%1\%1.diatheke.txt
..\xiphos\mod2imp %1 >..\Export\%1\%1.raw.imp.txt
..\xiphos\emptyvss %1 >..\Export\%1\%1.emptyvss.txt
Parameter %1 is the first command line parameter, thus:
ExportMod HunKar
generates all 3 output files in a suitable Export folder, one that I create manually beforehand in Windows Explorer.
Notes:
mklink
command.@krisek
Further to your repair for the last verse in each chapter...
The Analysis2.zip file has been updated and replaced in the earlier comment..
U+0060 ` 4 GRAVE ACCENT U+2022 • 3 BULLET
fixed in 0421aa8
The counts of left and right parenthesis do not match:
U+0028 ( 220 LEFT PARENTHESIS U+0029 ) 219 RIGHT PARENTHESIS
fixed in a6a7ef9
The counts of double quotation marks do not match:
U+201D ” 12 RIGHT DOUBLE QUOTATION MARK U+201E „ 13 DOUBLE LOW-9 QUOTATION MARK
fixed in c532a5e
1 Benjamin 130 Benjámin
This is like this in all onlice sources I reviewed. I think it reflects that the greek (new testament) and hebrew (old testament) writng form might be different.
Confirmed in printed version too:
1 Beszéljetek 4 Beszéljétek
These are the two different forms of the same word (formed through agglutination) with different meaning. It's okay as it is.
There are 36 words that end with a hyphen/minus.
1 Búza- 2 alsó- 6 arany-
These are valifd forms in Hungaian. (For listing)
example
Here the meaning is "nyárvesszőket, mogyoróvesszőket és gesztenyevesszőket" (branches of poplar, almond, and cheesnut trees), but the way how it is in the text is the correct way, so we list only the first parts.
And besides these there are a lot of places where we use hyphens. For listing (max 2 elements), to question a specific word in a sentence (by adding the -e suffix), etc. etc.
1 Ben-Hadad 1 Ben-Hadád
This is as per the printed version.
I think I fixed all in this now, let's open dedicated issue if there's anything left.
I had providentially retained my outputs of the earlier HunKar module done on 2011-09-23, so I was able to compare both old & new diatheke outputs as well as both the derived word frequency analyses.
The former can't help but include reporting a lot of differences merely due to a systematic change of some punctuation marks such as hyphen
-
by endash–
, vizby
and also the following improvement:
by
Setting these aside in order to focus on differences at word level still leaves a significant number of such!
These are best examined initially by the comparison of word frequency counts.
I have used WinMerge to generate a patch file, see within the attached Zip file.
HunKar.diatheke.word.frequency.diff.zip
There are 2146 differences!
Many of these are places where the space between two words is now missing!
Each of these should be reviewed and fixed where necessary.