Closed funderburkjim closed 3 years ago
@AnnaRybakovaT
Here is something you could start working on now, that would not overlap with @sanskritisampada's work.
First, download the file ap90_notes.txt.
This contains cases that were in Dhaval's original 'ap90_error.txt' that I excluded from 'ap90_error1.txt'. There are about 1200 cases.
These need to be examined further to know how to handle. You could examine these and add codes to indicate the nature of the change that should be made.
Here are some examples that I've marked to get you started:
hyphenationPage: 133 akzara beatiPage -> beati-tude : beatitude
hyphenationPage: 156 akzetra recepPage -> recep-tacle : receptacle
hyphenationPage2: 156 akzetra tacle -> receptacle : SKIP
Capitalized :192 agarI Deotar : OK (plant)
hyphenated: 195 agasti wellnigh -> well-nigh : OK (hyphenated)
Capitalized :204 aguru Aquiluria : OK (plant)
...
hyphenated: 213 agniH firemissile -> fire-missile : OK (hyphenated)
...
Capitalized :271 aMgaM Champanagar : OK (place)
...
hyphenated: 12530 gItA tothe -> to-the : to the
...
hyphenated: 12233 gajaH excelent -> excel-ent : excellent
So, my suggestion is that you add the fourth field to each of the lines of ap90_notes.txt.
That information will let us know how these additional cases need to be handled.
@AnnaRybakovaT To get started, add markup to the first 50 or so cases of ap90_notes.txt.
Then, let me see the results, so I'll know we're in agreement on the details of the additional information in ap90_notes.txt.
Dear Jim, Thanks a lot for the examples, everything is clear now. Could you to write examples also for those 3 cases:
hyphenationPage2: 63 akAma fluenced -> ninfluenced Capitalized :118 akzaH TerPage hyphenationPage2: 118 akzaH minalia -> erminalia
My solution is below but I am not sure:
hyphenationPage2: 63 akAma fluenced -> ninfluenced : Uninfluenced Capitalized :118 akzaH TerPage -> Terminalia : OK (plant Terminalia) hyphenationPage2: 118 akzaH minalia -> erminalia : SKIP
Thanks
Dear Jim, Could you check some first cases ( I suppose everything is correct; only I don't know what to do with the cases - hyphenationPage2, it is the last line): Capitalized :213 agniH Ignis : OK (god) Capitalized :213 agniH Ogni : OK (god) Capitalized :213 agniH Slavonians (ethnolinguistic group of people) hyphenated: 213 agniH firemissile -> fire-missile : OK (hyphenated) Capitalized :213 agniH Pushpamitra : OK (person) hyphenationPage: 224 agra excelPage -> excel-lence : excellence Capitalized :266 aMkowaH Ankola : OK (plant) Capitalized :271 aMgaM Champanagar : OK (place) Capitalized :271 aMgaM Champapura : OK (place) Capitalized :271 aMgaM Zingiber : OK (plant) hyphenationPage: 271 aMgaM forePage -> fore-telling : foretelling hyphenated: 281 aMganA wellrounded -> well-rounded : well rounded Capitalized :352 aja Mentagra : OK (disease) Capitalized :352 aja Carpopogen : Carpopogon (plant) hyphenated: 427 aMcita halfstrung -> half-strung : half strung Capitalized :433 aMjanA Kunjara : OK (name of monkey) hyphenated: 437 aMjaliH cavityful -> cavity-ful : cavity full hyphenationPage: 541 atigrAhya libaPage -> liba-tions : libations Capitalized :551 aticCatraH Anesum : OK (plant) Capitalized :610 atibala Sidonia : OK (plant) Capitalized :636 atimodA Heterophyllum : OK (plant)
hyphenationPage2: 642 atiraBasaH pitateness -> precipitateness : OK
@AnnaRybakovaT
Just update your copy of ap90_notes.txt with your work.
Then, when you've finished, you can upload it (I'll describe how to upload when the time comes).
The examples you've posted look like what I had in mind. Good!
Here are repsonses to specific ones you ask me to look at:
{#atiraBasaH#}¦ Great speed, preci-
[Page0033-c+ 57]
<>pitateness, head-long speed, rash-
I think those are all the ones you questioned. Bottom-line -- Keep up good work!
Feel free to ask about other questionable cases as you notice them.
When you are finished with ap90_notes.txt, I'll write a program that makes changes to ap90.txt based upon the information in ap90_notes.txt.
You may, from time to time, want to make some additional comment about a case, and put it into a separate line of ap90_notes.txt. This is fine PROVIDED YOU START ANY COMMENT LINES WITH a SEMICOLON ';' For example
hyphenationPage2: 118 akzaH minalia -> erminalia : SKIP
; This is handled by previous case
The program I write will skip lines starting with a semicolon.
Dear Jim,
Thanks for your detailed clarifications.
Dear Jim, I faced on with some lines with this kind of error - there is not gap between a word and a bracket "(". For example: Capitalized :1570 anucarciH Repeatingin
My current solution: Capitalized :1570 anucarciH Repeatingin -> Repeating-in : Repeating in ; Wrong kind of error. In the correct text should be a bracket "(" between Repeating and in - Repeating (in ...
Could you advise how to correct such cases?
As well I would be happy to know how to solve such cases: Capitalized :1963 anuzekaH Rewatering
Current solution: Capitalized :1963 anuzekaH Rewatering : rewatering ; Wrong kind of error, there is present participle of rewater - "rewatering"
Capitalized :1570 anucarciH Repeatingin
suggested Solution:
Capitalized :1570 anucarciH Repeatingin : Repeating (in
Above, I see
Capitalized :1570 anucarciH Repeatingin -> Repeating-in : Repeating in ; Wrong kind of error. In the correct text should be a bracket "(" between Repeating and in - Repeating (in ...
This looks like you have intended a comment, starting with '; Wrong ...' Could you put the comment one or more separate lines, as
Capitalized :1570 anucarciH Repeatingin -> Repeating-in : Repeating in
; Wrong kind of error. In the correct text should be a bracket "(" between Repeating and in - Repeating (in ..
Capitalized :1963 anuzekaH Rewatering : rewatering ; Wrong kind of error, there is present participle of rewater - "rewatering"
I would solve as:
Capitalized :1963 anuzekaH Rewatering : OK
; I agree with you that this is a present participle
; However, it is spelled correctly, and the capital letter does agree with the printed text
; The 'Capitalized' classification was added by a program as a guess as to why this spelling
; was not identified as an English word by Dhaval's program.
; So don't worry that the classification seems inappropriate.
; My solution (`: OK`) just indicates that there is nothing objectionable about the spelling 'Rewatering',
; so no digitization change is required.
Dear Jim,
Thanks a lot! Everything is clear.
Thanks @AnnaRybakovaT and @funderburkjim, I hope the work can be scaled.
Dear Jim, I have finished the file ap90_notes.txt. Please, explain how I can upload it.
I have difficulties only with some below items:
Capitalized :3109 aBiDA Cmop : OK (Cmop.) ; {@--Cmop.@} {#--DvaMsin#} Capitalized :10696 kAMdiSIka Hense ; I am not sure - can be a name or a word "hence" Capitalized :12616 guru Indust ; the origynal text - Indust {#--lAGavaM#} hyphenated: 13474 cUrRaH limeburner -> lime-burner ; I don't know which option is correct - "limeburner" (one that burns limestone) or "lime-burner" (oven where limestone burns) Capitalized :14464 taru Nav ; the origynal text - the Nav mallikā creeperr hyphenated: 19801 prati branchvein -> branch-vein : OK (hyphenated) ; medicine Capitalized :21386 proTa Embrye : Embryo ; Better to double check apitalized :21944 BawwAra Hch ; original text - as in {#BawwAraharicadrasya pa-#}{#dmabaMDo nfpAyate#} Hch. hyphenated: 22865 mahA intinction -> in-tinction ; original text - a musical intinction of individuality (according to the Buddhists) Capitalized :23367 mUlaM Mooltan : OK ; probably - language Capitalized :23739 yAta Attaince : OK ; probably "Attained", original text - Attaince, reduced or gone to (a state &c.) Capitalized :24562 lAjaH Wotted ; probably "Roasted". Original text - [{#lAj-ac#}] Wotted grain
Also I found some additional errors:
EXTRA ERRORS
10007 kaMcukin arPage -> ar-mour : armor 10529 kalya To-morrow : Tomorrow ; To-morrow is "old" spelling 14450 tarala liqur : liquor 15826 drupadaH a ther : father 16406 nava newly, married : newly-married ; original text - 1. a newly, married woman, a bride 19129 puras van ; original text "one who fights in the van or front-line" - I am not sure about a word "van" 19256 pUrva A foresaid. : Aforesaid, 19377 pESunaM Back-biting : Backbiting 22865 mahA enterprize : enterprise 23821 yuktiH oxpression : expression 24134 rAvaRa henee : hence 24134 rAvaRa eapis tal : capital 27652 SaraH multi tude : multitude 28094 SudDiH errata : erratum 32003 kalhaRa contemporay : contemporary 32007 jagannATapaMqita career-lay : career lay 32106 utkala to was : towns
Also I found some additional errors
Thanks, good catch. Hope @funderburkjim can take a look at them.
@AnnaRybakovaT
how I can upload it ?
It is easy. Just make a new comment such as 'Here is the revised ap90_notes.txt.`
Then, while still in the comment,
drag your local file into the comment. Sounds weird, but this will cause that local file to be uploaded to some file on Github and a link will be made to that Github file so others (such as me) can download the file.
You can click 'preview' to the comment to see how this works. When you are done with the comment, just click the 'Comment' button as usual to post the comment.
@AnnaRybakovaT
I'll look at your questionable items when I review your uploaded file.
Ditto for the Extra Corrections (Glad you took initiative to mention these).
It will be a while before I process your corrections. Reason: Sampada is working on the 'ap90_errors.txt' batch of changes. It will be confusing if I install your corrections before Sampada's corrections are finished. This will probably be several weeks from now.
@AnnaRybakovaT
So, you're done for now with the ap90_notes task. THANK YOU!
Are you interested in starting another similar task, working on corrections of English words in another dictionary (see list prepared by Dhaval in #14).
I would suggest starting with 'cae.txt' (Cappeller Sanskrit-English Dictionary). If you want to do this, I'll write some instructions in a separate issue.
This will probably be several weeks from now.
How many hours per day are required to implement all these batches? Would love to know, please tell how much work is it from your side.
Here is the revised ap90_notes.txt :
This file doesn't include the list with Extra errors.
I would suggest starting with 'cae.txt' (Cappeller Sanskrit-English Dictionary). If you want to do this, I'll write some instructions in a separate issue.
Dear Jim, I have free time during some next months and I would like with pleasure to do any useful work. I would be great if you can write instructions regarding the new task.
time required to implement all these batches?
The workflow with Sampada involves work that she does and work that I do.
I'm not sure how much time per batch Sampada uses on average.
From me, the time is something like:
When I get to Anna's work (after Sampada's work on AP90 is done), it will probably take several hours to develop programs to incorporate that into ap90.txt. Hard to know exactly how long.
Understood. If you leave it there will be nobody who can actually replace it. So I ask you to document as much of all the processes, including time spent, as possible, thanks. In that case at least we have a chance.
nobody who can actually replace it
I feel sure that Dhaval could accomplish the same end (getting corrections installed in csl-orig/v02/xxx/xxx.txt), if the occasion arose. The details of his workflow would probably differ from the details of my workflow, but that doesn't matter.
@AnnaRybakovaT
Do you know how to use 'git' at the command-line? I'm thinking about how to work with you on another
dictionary, and the details depend in part on what computer tools I can assume.
If you don't know how to use git, would you be interested in learning?
@AnnaRybakovaT ap90_notes_Revised.txt looks fine!
If you don't know how to use git, would you be interested in learning?
Dear Jim, Until today I didn't know anything about git. If you think that it is possible to learn something from zero level, we can try.
@funderburkjim Closeable?
This issue opened for work with @AnnaRybakovaT.
Continuation of #20