sanskrit-lexicon / COLOGNE

Development of http://www.sanskrit-lexicon.uni-koeln.de/
18 stars 3 forks source link

Todo list in 2020 #289

Open drdhaval2785 opened 4 years ago

drdhaval2785 commented 4 years ago

1. Complete transfer to github based workflow.

  1. Note down the current display queries and make the responses give out JSON instead of formatted HTML. This will allow the frontend developers to use JSON the way they want to display the way they want.
  2. Correction submissions to be more user friendly. Auto fill the current digitization. 4. Teach @drdhaval2785 how to access the Cologne servers after the change in file systems and project locatoin, so that he can also process the corrections.
  3. Create a beautiful frontend UI.
  4. Create a windows installer for offline download of the dictionary.
  5. Create an ubuntu / debian / arch package for download on linux machines.
  6. Create python package which gives out the meanings / gender / conjugations etc of the given word / word form. 9. Transliteration choices in cookies, and not on frontend selection boxes.
  7. Allow search of all dictionaries in one display. 11. Devote time to simple search.
  8. Mine dictionaries for grammar. 13. Lanman Sanskrit Reader. 14. .htaccess short url

@gasyoun @funderburkjim may like to continue the list further.

gasyoun commented 4 years ago
  1. Is not finished yet?
  2. It started in 2017, but we are still stuck. [1] & [2]
  3. It is implemented, but only for 1 dictionary. Needs scaling.
  4. Means finding the man who can do it.
  5. Is where @YevgenJohn can come handy.
  6. List of all dictionary responses on a single page mode.
funderburkjim commented 4 years ago
  1. Allow search of all dictionaries in one display. There might be several ways to interpret this idea.
    • search for headwords: Enter a particular spelling of a headword. Find all dictionaries that have this word as a headword. Include variant spelling conventions.
    • if (in slp1 spelling) we enter 'aSva' we get entries also for dictionaries where the headword is spelled 'aSvaH'.
    • enter 'SreyaMs' we also get dictionaries that spell it 'Sreyas'

hwnorm1 is a start

We have some work that we can build on: hwnorm1c.txt hwnorm1. For instance, aSva is solved:

aSva:aSva:BEN,BHS,BOP,BUR,CAE,CCS,GRA,IEG,INM,MD,MW,MW72,PE,PUI,PW,PWG,SCH,SHS,STC,VCP,VEI,WIL,YAT;aSvaH:AP,AP90,SKD

But SreyaMs is not solved:

SreyaMs:SreyaMs:BEN,CAE,CCS,PW,PWG
Sreyas:Sreyas:AP,AP90,BOP,BUR,GRA,INM,MD,MW,MW72,SCH,SHS,VCP,WIL,YAT
funderburkjim commented 4 years ago
  1. continued Another approach might be to build a data structure having all the entries in all the dictionaries, then having separate steps that would generate pointers (search-terms) into these entries. I am intrigued by Elastic Search.

Another approach that might be useful is to have, perhaps on Cologne server, a directory with an html page for each headword for each dictionary, fully rendered. Or maybe an html page for each headword, with the page containing the entries from all dictionaries that have that headword [There would be about 300,000 such pages.]

Then, we could depend on Google to eventually index this. We could direct our attention to adding indexing suggestions, such as alternate spellings. There might be ways to add header markup that would allow a google search to specifically restrict its findings to this sub-website of Cologne.

funderburkjim commented 4 years ago

Comment on 6-8:

This reminds me of what Dhaval has worked on with the Android version of the dictionaries. As I understand it, on android there is a certain format which is widely used for dictionaries of all languages. And then there are viewers, such as the Colordict program on Android. So, you get your local version of dictionaries in two steps:

Maybe something of this sort could be done. We are probably close to having the dictionaries (by virtue of the sqlite version of the dictionary, and sqlite files for some ancillary material such as abbreviations -- all this could be put into one sqlite file). The main missing piece in the sqlite area is related to the primitive 'database' used by advanced search.

Then the main problem reduces to having a multi-platform (pc, linux, mac) display program. The current displays take the sqlite data and generate html; and they do this with php; but this then requires setting up a server such as xampp or ubuntu. We need some analog of colordict. The php transform of sqlite data into html could be done by Python or by Javascript. It is also possible to create an executable (on Windows at least) that has its own Python interpreter -- in fact I did this once using web2py. On the Javascript side, there are 'Electron' apps. I suspect one of these solutions would provide user-friendly ways for non-technical people to get access to the dictionaries.

funderburkjim commented 4 years ago
  1. Devote time to simple search.

This is, currently at least, a unique feature at Cologne (AFAIK). It takes into account both inherent spelling variations (such as those related to hwnorm1 mentioned above) and also spelling variations that one encounters in fact. Marcis has mentioned from time to time certain short-comings of the current simple search; not only do we need to address these, but we also need to better understand exactly how to tweak the algorithms (they are tricky), and we also need to develop a test bed so we keep the good parts functioning as we add new parts.

Currently also, simple-search is not integrated at all with advanced search, which also has some unique features (such as substring searches).

funderburkjim commented 4 years ago
  1. Mine dictionaries for grammar

A prime case regards verb forms. There is a wealth of information regarding verb forms in MW verb entries. But this is unparsed. Wouldn't it be good if this information were parsed into some standard form, so it could be used as a reference for inflection-generating softward, such as in csl-inflect?

Such analysis would also no doubt uncover numerous thus-far hidden errors in the digitizations.

funderburkjim commented 4 years ago
  1. Lanman Sanskrit Reader

We have an additional digitization that Thomas prepared in 2017; it is basically the full book of Lanman's Sanskrit Reader, including his Notes and Vocabulary.

Thus far, I've done nothing with this. It is in Thomas's original text form.

This would be an ideal project for anyone wanting to get in on the ground floor of making a digitization. The dictionary part (177 pages) could then be added to the current Cologne dictionaries.

funderburkjim commented 4 years ago
  1. .htaccess short url
gasyoun commented 4 years ago

Then, we could depend on Google to eventually index this.

Yes, absolutely. But we need the .htaccess solved first.

I would leave 12 for now, because @YevgenJohn is looking at it. What about 13. Where is it hosted? Adore the book and was thinking once about printing it once again, years ago.

drdhaval2785 commented 3 years ago

I have started looking at the discussions in the issues of all Sanskrit-Lexicon repositories and intend to complete the survey of important discussions lying scattered everywhere. At the end of the work, the goal is to have the works listed according to priority. I am broadly classifying the items into 'High', 'Moderate', and 'Low' priority and a 'deep freeze' category. I have been able to review the issues noted in github on csl-apidev, csl-homepage, csl-orig and csl-websanlexicon as of now. Will keep all of you posted.

funderburkjim commented 3 years ago

Glad you are doing the review and cleaning house. Is it spring-cleaning time in India?

gasyoun commented 3 years ago

I have started looking at the discussions in the issues of all Sanskrit-Lexicon repositories and intend to complete the survey of important discussions lying scattered everywhere.

May I ask you for a favor? If you have visited an issue, add it to a Milestone, like Cleanup 2021?

At the end of the work, the goal is to have the works listed according to priority.

Sure, but they need to be added to Projects or Milestones. We do not use it now, but for no good reason.

broadly classifying the items into 'High', 'Moderate', and 'Low' priority and a 'deep freeze' category.

Agree with the tags, but they should contain additional classification. High bug is more important than Low UI issue. So we want to have different ways of classification. Because there are hundreds of issues partly documented and left for starving.

I have been able to review the issues noted in github on csl-apidev, csl-homepage, csl-orig and csl-websanlexicon as of now

That sounds a lot. I have updated the comment at the beginning of the thread. Let's have a call in one week from now on Sunday, so we can speak about 2021 plans.

drdhaval2785 commented 3 years ago

I agree on using milestone for priorities. I do not intend to put 'Cleanup 2021' tag, because I anyhow am planning to do a survey of all existing issues. Therefore, it would be superfluous. I have in the meanwhile visited all issues in csl-pywork, csl-doc and csl-corrections.

gasyoun commented 3 years ago

I do not intend to put 'Cleanup 2021' tag, because I anyhow am planning to do a survey of all existing issues.

Ok, any tag would do - Dhaval-checked. Let's break life before and after. Because we indeed badly and for long need a revision of everything.

drdhaval2785 commented 3 years ago

I have reviewed issues in MWS, hwnorm1, hwnorm2, VCP, cls-inflect, rvlinks and sanskrit-lexicon.github.io.

gasyoun commented 3 years ago

I have reviewed issues

Adding labels is crucial! We might want to review them later, based on labels.

drdhaval2785 commented 3 years ago

Reviewed AP90, SKD, PWG, GRA, PWK repositories.

@gasyoun

Adding labels is crucial! We might want to review them later, based on labels.

Kindly give me some time. You will get the labels and milestones. Don't worry. Let me work offline right now.

gasyoun commented 3 years ago

You will get the labels and milestones.

So be it, fine.

drdhaval2785 commented 3 years ago

Completed review of all issues in all repositories except the following three - COLOGNE, CORRECTIONS, alternateheadwords. They are repositories which will require extensive look. Will continue tomorrow.

gasyoun commented 3 years ago

Will continue tomorrow.

If you would only know how happy I'm to see you back, I even told it Usha.

drdhaval2785 commented 3 years ago

Completed the review of all open questions in sanskrit-lexicon organization. Will try to label / add milestone as per priority soon.

gasyoun commented 3 years ago

Completed the review of all open questions in sanskrit-lexicon organization

5 day hunt it was.