Todo list as of December 2015

drdhaval2785 commented 8 years ago

~~Extend the methods which we have used for cleanup of dictionaries to description also (See https://github.com/sanskrit-lexicon/CORRECTIONS/issues/34, ) for methods.~~ DONE in #309 09 Oct 2016
~~Abbreviation error corrections~~
Alternate readings should get headword status for all dictionaries (Only MW has it now). See https://github.com/sanskrit-lexicon/CORRECTIONS/issues/35, https://github.com/sanskrit-lexicon/CORRECTIONS/issues/133. https://github.com/sanskrit-lexicon/alternateheadwords is the dedicated repository to handle this problem.
~~hwnorm1 further development based on https://github.com/sanskrit-lexicon/CORRECTIONS/issues/43 conventions. - Assigned to @drdhaval2785~~
~~Find and correct convention errors found out as a by product of point 4. - Assigned to @gasyoun~~
Prepare a javascript which would enable us to click on an L-id and we would have the standard format in clipboard. See point 2 in the link. - Assigned to @juhnowski
~~Design crowdsourcing platform for correction submission. - Assigned to @funderburkjim~~
Prepare a list of abbreviation / literary resources for all dictionaries. See https://github.com/sanskrit-lexicon/CORRECTIONS/issues/142 and https://github.com/sanskrit-lexicon/CORRECTIONS/issues/143. - Assigned to @gasyoun and @drdhaval2785
~~Prepare a wikisource-like platform for keeping track of correction history. - Assigned to @funderburkjim (EDIT - Shifted to csl-orig github repository for tracking history)~~
Get upasarga+dhatu words to headword status from PW, PWG or rather all dictionaries.
~~Prepare a mechanism by which webpage and PDFs can be accessed via L-number. - Assigned to @funderburkjim.~~ Not important, because L-numbers change substantially nowadays.
Analyse the suspect entries which end with abnormal endings. - Assigned to @gasyoun
~~Do some verb comparision 'research'. See https://github.com/sanskrit-lexicon/CORRECTIONS/issues/87. - Assigned to @drdhaval2785, @gasyoun~~
Do some research on 'b'/'v' confusion of dictionaries and find some conventions and convention errors. Assigned to @drdhaval2785
~~Pattern mismatch finding based on n-grams.~~ https://github.com/sanskrit-lexicon/CORRECTIONS/issues/46#issue-51118866 refers to works 15 to 20.
Apply subanta and tiGanta generators to these methods - so that our tools are ready for application to description also. Use Dhaval's subanta and tiGanta tools. - Assigned to @drdhaval2785.
~~listing out impossible letter combinations by Sanskrit grammar rules.~~ - Assigned to @drdhaval2785. Listed all possible ngrams of sanhw2.txt. Whatever is not listed is impossible. https://github.com/sanskrit-lexicon/CORRECTIONS/issues/241#issuecomment-177692135 status update.
Taking English-Sanskrit dictionaries as base and clustering the Sanskrit words having same meaning. The word which is not repeated across dictionaries is suspect. - Assigned to @drdhaval2785
~~Search for a list of feminine words ending in 'a'~~ - Assigned to @drdhaval2785
~~Listing out words which appear only in one dictionary after filtering out common differences like M, H at the end, corresponding nasal letters etc. - Assigned to @drdhaval2785~~
Analise accents (key2), batch comparison. There should be differences in PWG vs. Indian sources. See https://github.com/sanskrit-lexicon/CORRECTIONS/issues/181#issuecomment-161429917 and Dhaval's accent tools. - Assigned to @drdhaval2785

gasyoun commented 8 years ago

7 and 9 sound equal to me.

gasyoun commented 8 years ago

21. Analise accents (key2), batch comparison. There should be differences in PWG vs. Indian sources. It was said in 1974 by Mayrhofer's pupil, but never approved. @funderburkjim can we extract all key2 fields as we have done with key1? I want to see the differences not only in headwords, but correct or document deviations in accents as well. In most cases I guess there will be an issue of lost accents or deviations, that should be left as such.

funderburkjim commented 7 years ago

What I see as priorites - May 2017

[This is in response to @gasyoun request ]

I'm generally in foot-soldier mode: slogging through the details of implementing some improvement in a tiny corner of the Cologne sanskrit-lexicon project. Let me pretend for a moment that I'm a general sitting on a hillock overlooking the battlefield, like Kutusov in War and Peace,

My priorities at the moment are:

Finish AS to IAST for all dictionaries - simple to state, lots of work to accomplish
Backup (dev) server smoothly functioning with Dhaval; we've dipped our toes in this this week. This has long-term benefit of decreasing the dependence of the Sanskrit-Lexicon project on me and Cologne.
Infrastructure normalization. This is not a glamorous objective (road-building rarely is), but improving roads and bridges makes everyone (potential contributor) more productive.
- The AS-IAST task is a part of this.
- As is the One DTD to rule them all ref:.
- Other aspects include:
  - simplifying the transition from xxx.txt to xxx.xml, by embedding some meta-data into xxx.txt. This has side benefit of stabilizing L-numbers.
  - Providing a programmatic base for the displays, so that all displays derive from the the same php class. This will permit simpler flow-through of improvements to all dictionaries. Currently, each dictionary is its own little kingdom (separate code base), so to implement a change to all dictionaries requires separate haggling.
Alternate headwords for various dictionaries
- The 'subheadwords' issue, although similar in some ways to the alternate headwords issue, is actually more complex because of the requirement to dive into parsing the entries, adding markup, not to mention the complexity of combining abbreviated affixes with parent headwords.
Corrections and data improvements as they arise always have high priority, e.g.
- AE with Sampada
- Greek
- Improvements relevant to stardict project Dhaval is working on
- corrections originating with users
- corrections arising in the course of implementing other tasks, such as AS to IAST.
Simple spelling UI ref:
UI for multiple dictionary displays, using hwnorm1

I would also like to finish the inflected form python rewrite that was begun last summer, but this always seems to get pre-empted by some more pressing request.

I probably could go on and on if I thought a bit more about what I'd like to get done.

This is my actual current TODO List .

Now let me get down from that hillock before nose-bleed ensues :)

juhnowski commented 7 years ago

- https://github.com/juhnowski/sanskrit-correction-js/blob/master/WIL_Basic.html

gasyoun commented 7 years ago

https://github.com/juhnowski/sanskrit-correction-js/blob/master/WIL_Basic.html

@juhnowski wow! 1) Please upload on your github.io so it can be tested 2) Open a new issue at https://github.com/sanskrit-lexicon/Cologne/issues (Cologne - because it's web development related), because this is a meta issue, no real discussions occur here, thanks!

funderburkjim commented 7 years ago

WIL_basic.html link broken.

juhnowski commented 7 years ago

@funderburkjim pleas try https://juhnowski.github.io/ but I have not yet done saving to a file

gasyoun commented 7 years ago

So for example UI for multiple dictionary displays, using hwnorm1 is a subtask of Simple spelling UI ref. Yes, millions of ways to improve, but it's ready to be launched publicly. Corrections and data improvements are always there, it's where we started our sojourney. Infrastructure normalization is huge and indeed who would need new roads if the old trail is still there. Backup (dev) server - does Dhaval has access to all the backend scripts, all the dev scripts ever developed by Jim? The ones that we see on his own github page, for example. AS to IAST for all dictionaries similar to Corrections and data improvements is a background task and no need to speed it up, from my perspective. And we always have to keep in mind that there are high and low priority dictionaries. And the only thrilling tasks left is subheadwords https://github.com/sanskrit-lexicon/alternateheadwords/issues/20 - and I would want to understand how my coders could help, because frankly - I do not know. Because you do have some code already related to it and I would love to see it first.

gasyoun commented 7 years ago

@funderburkjim let me introduce you @vschary, he wants to help and @Shalu411 said he is able to do so. Any ideas?

funderburkjim commented 7 years ago

Re @vschary wants to help.

I'm assuming that the interest is in the Sanskrit checking -- as opposed to programming.

One thing in the line of 'checking' relates to alternate headwords for vcp. We had a list of about 1000 cases where the accuracy of derivation of the alternate headwords have been auto-checked only. Probably most of these auto-generated alternates are correct, but it would be good to have a knowledgeable human examine each of them.

I am thinking specifically of the 'ok1' list mentioned here. Here is a link to the current form of that ok1 list. For instance the first case is

Case 0001: OK,OK : 1:aMsa(se)BAra:aMseBAra:aMsasera:169:170

The important parts are aMsa(se)BAra and aMseBAra. And the interpretation is that 'AMseBAra' is an alternate spelling of 'AMsaBAra'. The thing to check is whether this intepretation is correct.

I could readily alter this to use Devanagari, IAST, or HK -- however @vschary prefers to read his Sanskrit.

I think this first pass could be done in a few hours, and would require nothing but the ok1 list; the idea would be to mark those that need further investigation. If there are any questionable ones, then he could investigate those further using the UI that SergeA used recently.

If this sounds like an appropriate task, we can discuss it further. If it doesn't sound appropriate, maybe @vschary can let us know what he might be interested in , and we'll work from that interest.

gasyoun commented 7 years ago

I'm assuming that the interest is in the Sanskrit checking -- as opposed to programming.

Exactly!

I could readily alter this to use Devanagari, IAST, or HK -- however @vschary prefers to read his Sanskrit.

Devanagari, he is from India. Everything other than SLP1 will do, but Devangari is best if you are from India.

drdhaval2785 commented 3 years ago

Status Update on 20 December 2020.

Out of Jim's wishlist at https://github.com/sanskrit-lexicon/CORRECTIONS/issues/181#issuecomment-299332371, all were completed except the following.

Alternate headwords for various dictionaries
    The 'subheadwords' issue, although similar in some ways to the alternate headwords issue, is
    actually more complex because of the requirement to dive into parsing the entries, adding markup,
    not to mention the complexity of combining abbreviated affixes with parent headwords.
 Greek

gasyoun commented 3 years ago

Greek

What about Greek?

sanskrit-lexicon / CORRECTIONS

Todo list as of December 2015 #181

What I see as priorites - May 2017