Open drdhaval2785 opened 4 years ago
We have some work that we can build on: hwnorm1c.txt hwnorm1. For instance, aSva is solved:
aSva:aSva:BEN,BHS,BOP,BUR,CAE,CCS,GRA,IEG,INM,MD,MW,MW72,PE,PUI,PW,PWG,SCH,SHS,STC,VCP,VEI,WIL,YAT;aSvaH:AP,AP90,SKD
But SreyaMs is not solved:
SreyaMs:SreyaMs:BEN,CAE,CCS,PW,PWG
Sreyas:Sreyas:AP,AP90,BOP,BUR,GRA,INM,MD,MW,MW72,SCH,SHS,VCP,WIL,YAT
Another approach that might be useful is to have, perhaps on Cologne server, a directory with an html page for each headword for each dictionary, fully rendered. Or maybe an html page for each headword, with the page containing the entries from all dictionaries that have that headword [There would be about 300,000 such pages.]
Then, we could depend on Google to eventually index this. We could direct our attention to adding indexing suggestions, such as alternate spellings. There might be ways to add header markup that would allow a google search to specifically restrict its findings to this sub-website of Cologne.
Comment on 6-8:
This reminds me of what Dhaval has worked on with the Android version of the dictionaries. As I understand it, on android there is a certain format which is widely used for dictionaries of all languages. And then there are viewers, such as the Colordict program on Android. So, you get your local version of dictionaries in two steps:
Maybe something of this sort could be done. We are probably close to having the dictionaries (by virtue of the sqlite version of the dictionary, and sqlite files for some ancillary material such as abbreviations -- all this could be put into one sqlite file). The main missing piece in the sqlite area is related to the primitive 'database' used by advanced search.
Then the main problem reduces to having a multi-platform (pc, linux, mac) display program. The current displays take the sqlite data and generate html; and they do this with php; but this then requires setting up a server such as xampp or ubuntu. We need some analog of colordict. The php transform of sqlite data into html could be done by Python or by Javascript. It is also possible to create an executable (on Windows at least) that has its own Python interpreter -- in fact I did this once using web2py. On the Javascript side, there are 'Electron' apps. I suspect one of these solutions would provide user-friendly ways for non-technical people to get access to the dictionaries.
This is, currently at least, a unique feature at Cologne (AFAIK). It takes into account both inherent spelling variations (such as those related to hwnorm1 mentioned above) and also spelling variations that one encounters in fact. Marcis has mentioned from time to time certain short-comings of the current simple search; not only do we need to address these, but we also need to better understand exactly how to tweak the algorithms (they are tricky), and we also need to develop a test bed so we keep the good parts functioning as we add new parts.
Currently also, simple-search is not integrated at all with advanced search, which also has some unique features (such as substring searches).
A prime case regards verb forms. There is a wealth of information regarding verb forms in MW verb entries. But this is unparsed. Wouldn't it be good if this information were parsed into some standard form, so it could be used as a reference for inflection-generating softward, such as in csl-inflect?
Such analysis would also no doubt uncover numerous thus-far hidden errors in the digitizations.
We have an additional digitization that Thomas prepared in 2017; it is basically the full book of Lanman's Sanskrit Reader, including his Notes and Vocabulary.
Thus far, I've done nothing with this. It is in Thomas's original text form.
This would be an ideal project for anyone wanting to get in on the ground floor of making a digitization. The dictionary part (177 pages) could then be added to the current Cologne dictionaries.
Then, we could depend on Google to eventually index this.
Yes, absolutely. But we need the .htaccess solved first.
I would leave 12 for now, because @YevgenJohn is looking at it. What about 13. Where is it hosted? Adore the book and was thinking once about printing it once again, years ago.
I have started looking at the discussions in the issues of all Sanskrit-Lexicon repositories and intend to complete the survey of important discussions lying scattered everywhere. At the end of the work, the goal is to have the works listed according to priority. I am broadly classifying the items into 'High', 'Moderate', and 'Low' priority and a 'deep freeze' category. I have been able to review the issues noted in github on csl-apidev, csl-homepage, csl-orig and csl-websanlexicon as of now. Will keep all of you posted.
Glad you are doing the review and cleaning house. Is it spring-cleaning time in India?
I have started looking at the discussions in the issues of all Sanskrit-Lexicon repositories and intend to complete the survey of important discussions lying scattered everywhere.
May I ask you for a favor? If you have visited an issue, add it to a Milestone, like Cleanup 2021?
At the end of the work, the goal is to have the works listed according to priority.
Sure, but they need to be added to Projects or Milestones. We do not use it now, but for no good reason.
broadly classifying the items into 'High', 'Moderate', and 'Low' priority and a 'deep freeze' category.
Agree with the tags, but they should contain additional classification. High bug
is more important than Low UI
issue. So we want to have different ways of classification. Because there are hundreds of issues partly documented and left for starving.
I have been able to review the issues noted in github on csl-apidev, csl-homepage, csl-orig and csl-websanlexicon as of now
That sounds a lot. I have updated the comment at the beginning of the thread. Let's have a call in one week from now on Sunday, so we can speak about 2021 plans.
I agree on using milestone for priorities. I do not intend to put 'Cleanup 2021' tag, because I anyhow am planning to do a survey of all existing issues. Therefore, it would be superfluous. I have in the meanwhile visited all issues in csl-pywork, csl-doc and csl-corrections.
I do not intend to put 'Cleanup 2021' tag, because I anyhow am planning to do a survey of all existing issues.
Ok, any tag would do - Dhaval-checked
. Let's break life before and after. Because we indeed badly and for long need a revision of everything.
I have reviewed issues in MWS, hwnorm1, hwnorm2, VCP, cls-inflect, rvlinks and sanskrit-lexicon.github.io.
I have reviewed issues
Adding labels is crucial! We might want to review them later, based on labels.
Reviewed AP90, SKD, PWG, GRA, PWK repositories.
@gasyoun
Adding labels is crucial! We might want to review them later, based on labels.
Kindly give me some time. You will get the labels and milestones. Don't worry. Let me work offline right now.
You will get the labels and milestones.
So be it, fine.
Completed review of all issues in all repositories except the following three - COLOGNE, CORRECTIONS, alternateheadwords. They are repositories which will require extensive look. Will continue tomorrow.
Will continue tomorrow.
If you would only know how happy I'm to see you back, I even told it Usha.
Completed the review of all open questions in sanskrit-lexicon organization. Will try to label / add milestone as per priority soon.
Completed the review of all open questions in sanskrit-lexicon organization
5 day hunt it was.
1. Complete transfer to github based workflow.4. Teach @drdhaval2785 how to access the Cologne servers after the change in file systems and project locatoin, so that he can also process the corrections.9. Transliteration choices in cookies, and not on frontend selection boxes.11. Devote time to simple search.13. Lanman Sanskrit Reader.14. .htaccess short url@gasyoun @funderburkjim may like to continue the list further.