Open drdhaval2785 opened 8 years ago
7 and 9 sound equal to me.
21.
Analise accents (key2), batch comparison. There should be differences in PWG vs. Indian sources. It was said in 1974 by Mayrhofer's pupil, but never approved.
@funderburkjim can we extract all key2
fields as we have done with key1
? I want to see the differences not only in headwords, but correct or document deviations in accents as well. In most cases I guess there will be an issue of lost accents or deviations, that should be left as such.
[This is in response to @gasyoun request ]
I'm generally in foot-soldier mode: slogging through the details of implementing some improvement in a tiny corner of the Cologne sanskrit-lexicon project. Let me pretend for a moment that I'm a general sitting on a hillock overlooking the battlefield, like Kutusov in War and Peace,
My priorities at the moment are:
I would also like to finish the inflected form python rewrite that was begun last summer, but this always seems to get pre-empted by some more pressing request.
I probably could go on and on if I thought a bit more about what I'd like to get done.
This is my actual current TODO List .
Now let me get down from that hillock before nose-bleed ensues :)
https://github.com/juhnowski/sanskrit-correction-js/blob/master/WIL_Basic.html
@juhnowski wow! 1) Please upload on your github.io so it can be tested 2) Open a new issue at https://github.com/sanskrit-lexicon/Cologne/issues (Cologne - because it's web development related), because this is a meta issue, no real discussions occur here, thanks!
WIL_basic.html link broken.
@funderburkjim pleas try https://juhnowski.github.io/ but I have not yet done saving to a file
So for example UI for multiple dictionary displays, using hwnorm1
is a subtask of Simple spelling UI ref
. Yes, millions of ways to improve, but it's ready to be launched publicly. Corrections and data improvements
are always there, it's where we started our sojourney. Infrastructure normalization
is huge and indeed who would need new roads if the old trail is still there. Backup (dev) server
- does Dhaval has access to all the backend scripts, all the dev scripts ever developed by Jim? The ones that we see on his own github page, for example. AS to IAST for all dictionaries
similar to Corrections and data improvements
is a background task and no need to speed it up, from my perspective. And we always have to keep in mind that there are high and low priority dictionaries. And the only thrilling tasks left is subheadwords
https://github.com/sanskrit-lexicon/alternateheadwords/issues/20 - and I would want to understand how my coders could help, because frankly - I do not know. Because you do have some code already related to it and I would love to see it first.
@funderburkjim let me introduce you @vschary, he wants to help and @Shalu411 said he is able to do so. Any ideas?
Re @vschary wants to help.
I'm assuming that the interest is in the Sanskrit checking -- as opposed to programming.
One thing in the line of 'checking' relates to alternate headwords for vcp. We had a list of about 1000 cases where the accuracy of derivation of the alternate headwords have been auto-checked only. Probably most of these auto-generated alternates are correct, but it would be good to have a knowledgeable human examine each of them.
I am thinking specifically of the 'ok1' list mentioned here. Here is a link to the current form of that ok1 list. For instance the first case is
Case 0001: OK,OK : 1:aMsa(se)BAra:aMseBAra:aMsasera:169:170
The important parts are aMsa(se)BAra and aMseBAra. And the interpretation is that 'AMseBAra' is an alternate spelling of 'AMsaBAra'. The thing to check is whether this intepretation is correct.
I could readily alter this to use Devanagari, IAST, or HK -- however @vschary prefers to read his Sanskrit.
I think this first pass could be done in a few hours, and would require nothing but the ok1 list; the idea would be to mark those that need further investigation. If there are any questionable ones, then he could investigate those further using the UI that SergeA used recently.
If this sounds like an appropriate task, we can discuss it further. If it doesn't sound appropriate, maybe @vschary can let us know what he might be interested in , and we'll work from that interest.
I'm assuming that the interest is in the Sanskrit checking -- as opposed to programming.
Exactly!
I could readily alter this to use Devanagari, IAST, or HK -- however @vschary prefers to read his Sanskrit.
Devanagari, he is from India. Everything other than SLP1 will do, but Devangari is best if you are from India.
Status Update on 20 December 2020.
Out of Jim's wishlist at https://github.com/sanskrit-lexicon/CORRECTIONS/issues/181#issuecomment-299332371, all were completed except the following.
Alternate headwords for various dictionaries
The 'subheadwords' issue, although similar in some ways to the alternate headwords issue, is
actually more complex because of the requirement to dive into parsing the entries, adding markup,
not to mention the complexity of combining abbreviated affixes with parent headwords.
Greek
Greek
What about Greek?
Extend the methods which we have used for cleanup of dictionaries to description also (See https://github.com/sanskrit-lexicon/CORRECTIONS/issues/34, ) for methods.DONE in #309 09 Oct 2016Abbreviation error correctionshwnorm1 further development based on https://github.com/sanskrit-lexicon/CORRECTIONS/issues/43 conventions. - Assigned to @drdhaval2785Find and correct convention errors found out as a by product of point 4. - Assigned to @gasyounDesign crowdsourcing platform for correction submission. - Assigned to @funderburkjimPrepare a wikisource-like platform for keeping track of correction history. - Assigned to @funderburkjim (EDIT - Shifted to csl-orig github repository for tracking history)Prepare a mechanism by which webpage and PDFs can be accessed via L-number. - Assigned to @funderburkjim.Not important, because L-numbers change substantially nowadays.Do some verb comparision 'research'. See https://github.com/sanskrit-lexicon/CORRECTIONS/issues/87. - Assigned to @drdhaval2785, @gasyounPattern mismatch finding based on n-grams.https://github.com/sanskrit-lexicon/CORRECTIONS/issues/46#issue-51118866 refers to works 15 to 20.listing out impossible letter combinations by Sanskrit grammar rules.- Assigned to @drdhaval2785. Listed all possible ngrams of sanhw2.txt. Whatever is not listed is impossible. https://github.com/sanskrit-lexicon/CORRECTIONS/issues/241#issuecomment-177692135 status update.Search for a list of feminine words ending in 'a'- Assigned to @drdhaval2785Listing out words which appear only in one dictionary after filtering out common differences like M, H at the end, corresponding nasal letters etc. - Assigned to @drdhaval2785