Closed drdhaval2785 closed 8 years ago
@drdhaval2785 :n: - in no change cases just the source word is enough. The 2nd word should be optional, no?
space ignored in key1
. No real mistakes, several similar cases.
How to sort such cases out @funderburkjim ? Is the hiatus list used to weed out false positives, @drdhaval2785 ?
ccs:[L=26627] [p= 480-1]:[L=26627] [p= 480-2]:t
@drdhaval2785
If I copy-paste from http://sanskrit-lexicon.github.io/CORRECTIONS/ngram/output/html/allvsMW_2.html
I get
hippocrEtus PE
that is useless in all ways.
PE:hippocrEtus:n
or
PE:hippocrEtus:hippocrEtus:t
Would make more sense and would economy time.
Why do you need to copy paste frim HTML? I make an extra copy of the .txt file. Keep HTML and txt open side by side. Examine the PDFs of HTML and make changes to TXT file and examine and submit in bunch of 5. That way, at the end, I also have a full correction submission file to hand over to Jim.
Oh, did not open txt before.
ieg:agronomoi,191:agronomoi:n:oi
pui:ajimHa,239:ajimHa:n:Ha,mH
skd:atwaNa,706:atwaNa:n:tw
inm:aditeHputra,170:aditeHputra:n:eH
Looks nice. Only thing I would make abbreviations capital, like IEG instead of ieg.
The dicts were capital before. Made lowercase in the format for ease of typing in case of manual submission. Otherwise pressing Shift for three letters is cumbersome.
sch:akLpta,173:akLpta:n:Lp,kL kLp (only L dhatu)+ ieg:agronomoi,191:agronomoi:n:oi Greek+ pui:ajimHa,239:ajimHa:n:Ha,mH+ skd:atwaNa,706:atwaNa:n:tw
inm:aditeHputra,170:aditeHputra:n:eH (two different words, aditeH+putra, why not use key2)? inm:aditeHsuta,171:aditeHsuta:n:eH (two different words) skd:adwaNa,774:adwaNa:n:dw
@drdhaval2785 SKD remains a mystery for me. The Na in adwaNa does not seem to belong to the word, similar as in other cases.
acc:adButacaritaISvaraBAzita,301:adButacaritaISvaraBAzita:n:aI (two different words) pwg:aDyArUWa,2144:aDyArUWa:n:UW + sch:anavakLpta,2216:anavakLpta:n:Lp,kL + pui:anuHlAda,472:anuHlAda:n:Hl + pw:anuzwupkArmIRa,4904:anuzwupkArmIRa:n:pk +
ccs:aviQUs,2102:avidvaMs:t:iQ,QU +
@drdhaval2785 SKD remains a mystery for me. The Na in adwaNa does not seem to belong to the word, similar as in other cases.
That would remain mystery for you until and unless you read kavikalpadruma of vopadeva Appendix III. Pages 95-100.
There it explains the it-marker
system of Vopadeva's grammar.
For 'N' read
For 'a' read
So let's cut off the anubandhas?
cut off the anubandhas?
Include in the verb study. Not that simple. Needs a research tag.
First cut off, then research :neckbeard:
The discussion here has gone haywire. So let's install these submissions and continue submissions in some other issue.
@drdhaval2785 As I read it, you have been so kind as to prepare allvsMW_2_corrected.txt),
which aggregates all the corrections from this issue and all those mentioned above (thru #226).
Thus, I need only work from allvsMW_2_corrected to cover all these separate issues.
Just wanted you to confirm this, before I start installing tomorrow.
@funderburkjim You guessed it right. And I also cross checked that there is no entry where
Dhaval is a wonder-man. He invents a method (with or without hints). He uses it. He submits in a ready to go format. It's only a matter of pressing Enter. Am I wrong, Jim?
It's more than pressing 'Enter', but the standard form of submission considerably simplifies and makes routine the installation, and I appreciate that Dhaval has so prepared these standard form corrections.
Of course, I also view an aspect of my part of the installation task to be a diligent gatekeeper, and thus examine each submission.
How much time does a non-Enter acceptance of submission takes?
Re ap90:rOzhi:rOziha:t
I think rOzhi is correct. Notice the virAma under the 'z'
Can't find rOzhi in any other dictionary, and can't find rOziha in any dictionary.
Withdraw this analysis. Agree with @drdhaval2785 in #224. rOhiz . MW confirms.
Re How to sort out such cases as rAmeSvaraaDvarasuDAmaRi ACC
Consulting a list of words with a space in key2 would be a step in this direction.
Since key1 also drops out avagraha '
, such a list might include those with a single quote in key2.
To make such lists completely reliable might take more work than expected, since what is taken as 'key2' might be rather complicated for some dictionaries. And, in some cases (recall recent discussion of VEI) what is currently saved in the 'key2' field of X.xml might not be the best choice for key2.
Re :n: - in no change cases just the source word is enough. The 2nd word should be optional, no?
You could make the second word 'empty':
:headword::n: blah blah
This would save you time, and not require code rewrite. Since for ':n:' cases nothing is done with the data except posting to file `corrections_nochange.txt', it doesn't matter what is in that third field, except that the field is there.
Re make abbreviations capital, like IEG instead of ieg.
Agree with Dhaval, keep lower case. Programs assume lower case, I think.
The only reason capitals were used at all was that capital letters appear in the directory names at Cologne , like scans/IEGScan/.... . It would have been better to have lower case throughout, but hard to change now.
re: pwg:aDyArUWa,2144:aDyArUQa:t:UW
I think it should be print error.
Compare rUQa:
Also, compare glyphs for UW and UQa:
finished analysis of corrections this issue.
Finished all analyses.
127 no-changes added to corrections_nochange.txt
Beginning installation
60 changes in 20 dictionaries.
Corrections installed.
Here are some behind-the-scenes details regarding this marathon of installation of changes.
At the current level of automation, it takes about 15-20 minutes per dictionary for installation. So 5-6 hours for these 20 dictionaries.
The automation uses partial templating. In this case, 'partial' means that some base template files are used, but that the templates require manual adjustment in various places.
The first step for a given dictionary involves copying some files from a base model; here is that step for Wilson dictionary.
n/2014/pywork/correctionwork/
cp -r /afs/rrz.uni-koeln.de/vol/www/projekt/sanskrit-lexicon/http/docs/scans/MW72Scan/2014/pywork/correctionwork/issue-189 .
cd issue-189/
rm mw72*
rm prev_change.txt
rm pw_readme.txt
Edit readme.txt:
mw72 -> wil
etc.
Then, the readme.txt file is edited and instructions therein are followed. Here is that file for latest Wilson update:
;Corrections to WIL
; Ref https://github.com/sanskrit-lexicon/CORRECTIONS/issues/189
This is in directory pywork/correctionwork/issue-189/.
Input file is change.txt
step 1. Generate wilupd.txt, wilupd.tsv, and wilnochange.txt
sh prepareupd.sh wil dhaval 189
step 1a. Make manual adjustments to wilupd.txt:
cp wilupd.txt wilupd_edit.txt # corrections in a copy
1 revisions needed
step 2. Install corrections using wilupd_edit.txt
- By examination of pywork/update.sh, the last line is
python updateByLine.py ../orig/wil1.txt manualByLine1.txt ../orig/wil.txt
- So, we append wilupd_edit.txt to the end of file manualByLine1.txt
cd ../../
cp manualByLine1.txt prev_manualByLine1.txt
cat prev_manualByLine1.txt correctionwork/issue-189/wilupd_edit.txt > manualByLine1.txt
- Then, in pywork directory, issue that last command of update.sh, as shown
above
- Then, create the headwords file, wilhw2.txt
- Then, recreate (a) wil.xml, (b) ../web/sqlite/wil.sqlite
sh redo_hw.sh
sh redo_xml.sh
The rest of the steps update downloads and documentation
step3.
web/webtc directory:
edit web/webtc/download.html, and change as-of-date at bottom of file
remake downloads:
cd to downloads directory
sh redo_all.sh
If needed, initialize make_sync.sh and update_sync.sh:
- 1. Copy prototypes from MD:
cp ../../../MDScan/2014/pywork/make_sync.sh .
cp ../../../MDScan/2014/pywork/update_sync.sh .
- 2. Edit update_sync.sh so consistent with above:
Edit make_sync.sh for wil
prepare new sync update file.
sh make_sync.sh
step4a: copy wilupd.tsv to php/correction_response/
From pywork directory:
cp correctionwork/issue-189/wilupd.tsv ../../../../php/correction_response/
step4b: append wilupd.tsv to end of cfr.tsv
cd ../../../../php/correction_response/
cp cfr.tsv cfr-prev.tsv
cat cfr-prev.tsv wilupd.tsv > cfr.tsv
rm wilupd.tsv
step5a. On local machine, open Github application, then open local
CORRECTIONS repository in Explorer
step5b. Open GitBash terminal,
cd Documents/GitHub/CORRECTIONS/
sh redo_cfr.sh
step6a. edit history.txt, and write note of the changes
step6b. update dictionaries/WIL/wil_printchange.txt using cfr.tsv
This done by reformatting the 'print error' records of wilupd.txt
NOTE: 1 cases added
step6c. update corrections_nochange.txt from wilnochange.txt
NOTE: 0 cases added
step7. prepare new sanhw1.txt and sanhw2.txt
step7a. On Cologne server, change to scans/awork/sanhw1
(Assuming still in php/correction_response):
cd ../../scans/awork/sanhw1/
sh redo_update.sh
step7b. In local CORRECTIONS repository,
sh redo_sanhw12.sh
step8. sync with GitHub
step8a. Create commit
step8b. 'Sync'
step9. Make 'installation complete' note in #189.
step10. Update s3 backup of wil
step10a. Assuming in php/corretion_responses:
cd ../../scans/awork/virtualenv/aws/
step10b. Be sure the redo_all.sh of above is finished.
Make the script, execute it, and deactivate
python make_copy_environ.py wil
source s3bk_wil.sh
rm s3bk_wil.sh
As you can see, installation actually involves 3 systems: Cologne, GitHub (via local repository), and AWS s3.
It hurts. It's a sorrow path you walk, Jim.
Examine html
txt file is here. It maybe taken as base for making corrections in standard convention.
Total 309 entries to be examined.
I encourage @gasyoun to examine and submit corrections in standard convention.
@funderburkjim UPDATE on 1.1.2016- After submission in total of 9 parts, here is the standard format file for processing. https://github.com/sanskrit-lexicon/CORRECTIONS/blob/master/ngram/output/corrections/allvsMW_2_corrected.txt Best luck