sanskrit-lexicon / COLOGNE

Development of http://www.sanskrit-lexicon.uni-koeln.de/
18 stars 3 forks source link

JSON for integrating with new frontend #99

Closed drdhaval2785 closed 3 years ago

drdhaval2785 commented 7 years ago

https://github.com/sanskrit-lexicon/MWS/issues/37 has @juhnowski's proposal to segregate backend and frontend into separate parts with RESTful API with JSON as intermediary.

This issue is devoted to suggest and discuss formats for basic, list and advanced views of dictionaries.

drdhaval2785 commented 7 years ago

Basic view

[ { 'headword' : 'Davala', 'hwtype' : 'H1/H1B', 'lnum' : 100564, 'pglink' : 'linkToPdf', 'pgpart' : '1/2/3', 'entry' : 'dictionaryEntryInHtml' } ]

@funderburkjim, If I missed something, please add.

drdhaval2785 commented 7 years ago

Advanced view

basicView here means a block of data as defined above

[ { 'hw1' : basicViewForHw1, 'hw2' : basicViewForHw2, 'hw3' : basicViewForHw3, . . . 'hwn' : basicViewForHwn } ]

drdhaval2785 commented 7 years ago

List view

Same as advanced view 12 entries above and 12 entries below the target headword to be shown.

juhnowski commented 7 years ago

Good evening!

I like suggested format and have a question about fields mapping. Please correct if I make a mistake.

The data source is xml. I have dowdloaded mw.xml from http://www.sanskrit-lexicon.uni-koeln.de/scans/MWScan/2014/web/webtc/download.html

I get one record from xml:

110 akAra 3 a--kAra m. the_letter_or_sound a. 000002 1,1 2

This data I need represent on backend in following format: [ { 'headword' : 'Davala', 'hwtype' : 'H1/H1B', 'lnum' : 100564, 'pglink' : 'linkToPdf', 'pgpart' : '1/2/3', 'entry' : 'dictionaryEntryInHtml' } ]

Question:

@drdhaval2785 could you please clarify how to corresponds/maps data from xml with yours suggestions. Is the structure of one xml dictionary are identical with another one? I didn't yet see the backend.

gasyoun commented 7 years ago

@funderburkjim any thoughts?

funderburkjim commented 7 years ago

Please take a look at the example mentioned in this comment. This already existent api may suffice for many displays, or at least be close to sufficing.

You've ignored the paremeters to SEND to the api. These include dictionary code, input format (how headword is spelled), output format (how Devanagari is rendered in returned html), accent (whether to include accents in returned html). Should the parameters sent to the server also be part of the parameters returned from the server?

in the 'list02php' example, the returned html already bundles lnum and pglink.
Also, the 'hwtype'. When you request data for a headword like 'deva' from 'MW', the current api bundles all the records with key1=deva into the output.

I don't know what 'pgpart' is supposed to represent.

What is sent to api should probably also include a 'type' - meaning what kind of information is required back.

I think we should focus on the basic display first - the advanced search display (webtc2) has many more issues to deal with.

gasyoun commented 7 years ago

I think we should focus on the basic display first

Right.

funderburkjim commented 7 years ago

I've read 'Learning React' by Kirupa book website , and some of the react documentation . Haven't got a handle on redux yet.

There seems to be a lot to like about ReactJS for writing user interfaces - the thing I like best so far is the use of JSX to write self-contained bits of html which can be combined for an application. There is definitely a lot of 'spaghetti' code present in the current implementations at Cologne, and Reactifying things might be a way to improve that situation

However, one thing that I'm finding to be a stumbling block is the development of slightly complex UI elements. In particular, the citation input field with its 'suggestion' feature (see list-02.html . This impliementation uses jQuery UI's autocomplete.

It seems that it would be quite complex to develop a similar React component. I see that there is 'https://github.com/reactjs/react-autocomplete' , so would that be the way to go?

In general, the question is are there good off-the-shelf React alternatives to most of the JQuery-UI solutions?

Also, I haven't yet seen examples of using Ajax calls with ReactJS -- what is a good source for this? Is this where Redux comes in?

@juhnowski - What are your thoughts on this?

gasyoun commented 7 years ago

It seems that it would be quite complex to develop a similar React component. I see that there is 'https://github.com/reactjs/react-autocomplete' , so would that be the way to go?

So we can't keep that as it is?

funderburkjim commented 7 years ago

From my reading, it is not a good idea to mix jQuery with React (google 'React jQuery').

The reason is that jQuery deals directly with the DOM, while React maintains its own 'virtual DOM' from which it renders the real DOM. That's my current understanding .

gasyoun commented 7 years ago

From my reading, it is not a good idea to mix jQuery with React (google 'React jQuery').

Oh, ok. I would not worry much about it, before @juhnowski actually starts coding with it. He seems to be quite busy again, so let him tell his plans and let's not worry ahead.

funderburkjim commented 7 years ago

let's not worry ahead.

Sounds like good suggestion.

drdhaval2785 commented 7 years ago

Related to #117. That specifies input format. This issue specifies output format. Once both are settled, we will write down a documentation for API usage.

juhnowski commented 7 years ago

Good evening. Lately I was not engaged in coding because of studying of language. I studied 1/3 of my textbook and I feel that I am ready to start coding more advanced...

gasyoun commented 7 years ago

@juhnowski it's time to finish the easy spell module.

juhnowski commented 7 years ago

Yes.

funderburkjim commented 7 years ago

@juhnowski If that was a programming language textbook, which book?

gasyoun commented 7 years ago

programming language textbook

No, it's Sanskrit textbook, https://yadi.sk/i/ceve0IVgza7yT

juhnowski commented 7 years ago

@funderburkjim Hello, it was Sanskrit.

juhnowski commented 7 years ago

Goodnight. I finished the initial "search". So far, only functional part (simplest UI). Appearance is very modest. I am ready to put two of my repositories on the Sanskrit repository: 1) https://github.com/juhnowski/sanskrit-simple-search is a js search application with a test server and 2) https://github.com/juhnowski/FreqWordList is a generator Js file by frequency of words. Could you please tell me how best to do this? How to name, where to put, whether it is necessary to unite these two repositories into one? Also, I would like to receive a sandbox on the http://www.sanskrit-lexicon.uni-koeln.de host server where I could run the test application and, if necessary, upload files by FTP after fixing the bugs. I don't know how to configure the correct work of an application hosted on another domain without a system administrator in Cologne, which is why I needed a local node.js server and a chrome plugin. It would be very useful for me to establish contact with the system administrator in Cologne , if, of course, this is possible. I am ready to sign the NDA papers or what yet another else is required for access. I'm ready to go to the answers to questions, testing, correcting remarks.

gasyoun commented 7 years ago

So far, only functional part (simplest UI).

That's great and more than enough. UIs can come and go, functionality remains.

whether it is necessary to unite these two repositories into one?

I would go for it.

@funderburkjim what about the rest?

I would like to receive a sandbox on the http://www.sanskrit-lexicon.uni-koeln.de host server where I could run the test application and, if necessary, upload files by FTP after fixing the bugs.

I guess it's not possible, so if Jim will have all he needs on github, he can upload it (and only he) to Cologne via ssh.

juhnowski commented 7 years ago

@gasyoun I guess it's not possible, so if Jim will have all he needs on github, he can upload it (and only he) to Cologne via ssh.

Following files should be placed in some test folder in server (other files - for testing):

gasyoun commented 7 years ago

@juhnowski sure, let's see what @funderburkjim will tell. For me, it's the number one most wanted feature that I miss every day for many years. See https://github.com/sanskrit/sanscript.js - does it makes any sense for how to code losiness, like when va is treated equally to ba?

juhnowski commented 7 years ago

Good night. I wrote the offline version. I pursued two aims: the first - to test an algorithm of online version , the second - it will be possible to use as the separate program to lower load of the web server. This script use WebSQL database (sqlite db) of Chrome browser. I got xml dictionary, convert it by Xml2JS and open generated html file once in browser. After this I request WebSQL database instead of ajax requests.

Repositories: https://github.com/juhnowski/Xml2JS - converter https://github.com/juhnowski/offline_dic - application

@funderburkjim I apologize for not having answered the questions since April 11th.

In general, the question is are there good off-the-shelf React alternatives to most of the JQuery-UI solutions? - yes, I think, that there are all we needed. I have examples of successful implementation of complex interprise applications on the React.

Also, I haven't yet seen examples of using Ajax calls with ReactJS - ajax calls is deprecated method of requesting data. Now more advanced - fetch.

On my point of view, Redux is a more template, best practice then framework. The essence is everytime make new copy of data (immutable) and contol all states of application in one data structure.

React has an extension React Native - for mobile, and may be more fast than jQuery. In any case, it is necessary to try. I treat frameworks with doubt and without fanaticism. I think that React is difficult, but more promising at the moment. But I doubt it.

juhnowski commented 7 years ago

@gasyoun https://github.com/sanskrit/sanscript.js interesting

gasyoun commented 7 years ago

I haven't yet seen examples of using Ajax calls with ReactJS - ajax calls is deprecated method of requesting data. Now more advanced - fetch.

Thanks for the hint.

it is necessary to try. I treat frameworks with doubt and without fanaticism. I think that React is difficult, but more promising at the moment. But I doubt it.

Fully agree, same doubts on my end. We need to change, but what to use - is an open question.

funderburkjim commented 7 years ago

sandbox on Cologne server

This is not going to be possible.

If all you need is a node.js server, why not use gomix - now called glitch ? You can get a free account with Github user.

I also noticed today that they make provision for persistent data (in a .data directory), e.g. in form of sqlite database, which you mentioned. Since project containers seem to be built on node.js, glitch sounds like it might fit your needs.

Check this out. If it doesn't work for you, let me know.

funderburkjim commented 7 years ago

While I can upload the repository (sanskrit-simple-search) to Cologne, it won't be executable, because node.js is not present there.

Possibly it could be modified to use Babel and other javascript libraries available by CDN. But if it definitely requires node.js on the server, then it won't work on Cologne server.

juhnowski commented 7 years ago

@funderburkjim goodnight! Could you pleas upload only following files: fetching.html settings.html style.css word_frequency.js

node.js - for local tests.

funderburkjim commented 7 years ago

@juhnowski Hi.

What if I put those four files (fetching ... word_frequency) on github pages ?

juhnowski commented 7 years ago

I'm not sure, but think it will not work. Code use redirect from one domain to another one: https://en.wikipedia.org/wiki/Cross-origin_resource_sharing In order for the application to work properly on the git you need to configure CORS, I don't know how, and if is it possible then - ok, it will work.

drdhaval2785 commented 7 years ago

I guess CORS is needed to process CGI scripts like python php etc properly when request is sent from server to server. I did this once. But for HTML and js, it should work fine I guess.

funderburkjim commented 7 years ago

The code is using ajax calls to Cologne server (such as webtc/getword.php for Wilson). Thus, the CORS situation does prohibit proper functioning unless the code resides on Cologne server.

For purposes of testing, two small changes will allow the code to work FOR WILSON ONLY. I made a small simplified version of getword for wilson, named getword_phreebie.php. This version

In fetching.html, change test.url value to

 Approx line 415 of fetching.html.
test.url = "http://www.sanskrit-lexicon.uni-koeln.de/scans/WILScan/2014/web/webtc/getword_phreebie.php";

Second change: since the output from this test version varies from what @juhnowski was expecting, In the 'done' method for the $.ajax call just do

 document.write(html);
 return;

With these two changes, the program works serviceably for the purpose of algorithm testing.

Here it is on GitHub Pages: http://funderburkjim.github.io/sanskrit-simple-search0.0/fetching.html

Note: The large word_frequency.js file is not in the above GitHub. Program seems to work fine anyway. What is word_frequency.js file used for?

funderburkjim commented 7 years ago

Note: Noticed that algorithm does not handle 'vishnu' properly. The correct HK spelling is 'viSNu', but this variant is not among those generated.

funderburkjim commented 7 years ago

@juhnowski Suggest you make GitHub Pages version, where you can tweak the spelling algorithm.

I think the getword_phreebie.php version for Wilson should be fine for such tweaking; it will allow you to develop modifications like that needed for visnu.

If it is needed later, a version working for any dictionary can be used instead of getword_phreebie.php. [the apidev/getword.php probably will be close].

Later one of the Cologne php programs can be adapted so that you can send an array of json objects and get the same in return -- this would be more efficient than hitting the Cologne server with a separate request for each spelling variation.

drdhaval2785 commented 7 years ago

@funderburkjim, where can I see the code in action? Testing interface location?

gasyoun commented 7 years ago

Note: The large word_frequency.js file is not in the above GitHub. Program seems to work fine anyway. What is word_frequency.js file used for?

For ordering possible results by Oliver's frequency data on MW.

funderburkjim commented 7 years ago

where can I see the code in action?

The 'fetching.html' link above, for now.

juhnowski commented 7 years ago

Hi! @funderburkjim http://funderburkjim.github.io/sanskrit-simple-search0.0/fetching.html have an error: https://plus.google.com/photos/photo/110679495425413866455/6422748520248389890?icm=false&authkey=CKjnoKDCn_LmSA

Suggest you make GitHub Pages version, where you can tweak the spelling algorithm. - I'd rather deploy the application on my own domain at http://www.phreebie.net/ which is located in me within walking distance.

Note: Noticed that algorithm does not handle 'vishnu' properly. The correct HK spelling is 'viSNu', but this variant is not among those generated. - ok, I will find the bug and fix it, thank you.

funderburkjim commented 7 years ago

fetching.html still has CORS errors.

Apparently, the Cologne server has some sort of rate-limiting, probably as a security feature.

I've now modified both fetching.html at my repository and getword_phreebie.php on Cologne server.

Results:

Test url

funderburkjim commented 7 years ago

I'd rather deploy the application on my own domain

@juhnowski By all means. Just let us know the url, so we can help with tweaking the word-list generation.

juhnowski commented 7 years ago

@funderburkjim Yes! Test url - it work! Thank you. But I have some questins: 1) word_frequency.js - i don't undestand why 404 error... 2) Did you specifically make an output of all the word's variations? Even those that are not found? Or that something going wrong? 3) About 'visnu' - it's work only for SLP1. For others transliterations - I need rules, tables of chars variants... But, actually, it's error, when for all translit apply char variants for SLP1. What the correct algorithm fo variations? Convert to SLP1 and then variations, or apply custom char variations for each translit?

juhnowski commented 7 years ago

@funderburkjim By all means. Just let us know the url, so we can help with tweaking the word-list generation. - I understand that it remains to resolve the issue with word_frequency.js and the need for a separate host will disappear.

gasyoun commented 7 years ago

I need rules, tables of chars variants...

I guess here come Jim's pythons.

asrama NOT FOUND

asramA NOT FOUND

asrAma NOT FOUND

asrAmA NOT FOUND

asRama NOT FOUND

asRamA NOT FOUND

asRAma NOT FOUND

asRAmA NOT FOUND

asRRama NOT FOUND

asRRamA NOT FOUND

asRRAma NOT FOUND

asRRAmA NOT FOUND

azrama FOUND

 अश्रम

[L=4295] [p= 089]   .अश्रम¦ m. (-मः)
1 Freshness, freedom from fatigue.
2 Laziness, want of exertion.
E. अ neg. श्रम fatigue.
azramA NOT FOUND

azrAma NOT FOUND

azrAmA NOT FOUND

azRama NOT FOUND

azRamA NOT FOUND

azRAma NOT FOUND

azRAmA NOT FOUND

azRRama NOT FOUND

azRRamA NOT FOUND

azRRAma NOT FOUND

azRRAmA NOT FOUND

aSrama NOT FOUND

aSramA NOT FOUND

aSrAma NOT FOUND

aSrAmA NOT FOUND

aSRama NOT FOUND

aSRamA NOT FOUND

aSRAma NOT FOUND

aSRAmA NOT FOUND

aSRRama NOT FOUND

aSRRamA NOT FOUND

aSRRAma NOT FOUND

aSRRAmA NOT FOUND

Asrama NOT FOUND

AsramA NOT FOUND

AsrAma NOT FOUND

AsrAmA NOT FOUND

AsRama NOT FOUND

AsRamA NOT FOUND

AsRAma NOT FOUND

AsRAmA NOT FOUND

AsRRama NOT FOUND

AsRRamA NOT FOUND

AsRRAma NOT FOUND

AsRRAmA NOT FOUND

Azrama FOUND

 आश्रम

[L=5987] [p= 124]   .आश्रम¦ m. (-मः)
1 A religious order, of which there are four kinds referable to the different periods of life, 1st, that of the student or Brahmachárí; 2d, that of the householder or Grihast'ha; 3d, that of the anchorite or Vánapárast'ha; and 4th, that of the beggar or Bhikshu: see ब्रह्मचारी, &c.
2 A college, a school.
3 A hermitage, the abode of retired saints or sages.
4 A wood or thicket.
E. आङ् before श्रम to perform religious austerities, affix घञ्।
AzramA NOT FOUND

AzrAma NOT FOUND

AzrAmA NOT FOUND

AzRama NOT FOUND

AzRamA NOT FOUND

AzRAma NOT FOUND

AzRAmA NOT FOUND

AzRRama NOT FOUND

AzRRamA NOT FOUND

AzRRAma NOT FOUND

AzRRAmA NOT FOUND

ASrama NOT FOUND

ASramA NOT FOUND

ASrAma NOT FOUND

ASrAmA NOT FOUND

ASRama NOT FOUND

ASRamA NOT FOUND

ASRAma NOT FOUND

ASRAmA NOT FOUND

ASRRama NOT FOUND

ASRRamA NOT FOUND

ASRRAma NOT FOUND

ASRRAmA NOT FOUND
funderburkjim commented 7 years ago

word_frequency.js

In my fetching version I had not included word_frequency. However, it is now included.

What is returned by getword_phreebie.php

It returns Html for EVERY key suggestion sent.

What server should the fetching.html test program reside on?

This should be one that @juhnowski can work on. The one on my funderburk.github.io repository is not appropriate --- I did this just to get our testing past the CORS problem. You can make a 'juhnowski.github.io' repository and put the code in a 'sanskrit-simple-search' folder there. Then

funderburkjim commented 7 years ago

HK or SLP1 as target of word-generation

The word generation logic has to make some assumption about how Sanskrit words are spelled. It probably doesn't matter much which assumption (HK, SLP1, etc.) is made. But once made, all the algorithmic details need to be consistent with the assumption.

Based on my limited understanding of the work @juhnowski and @gasyoun have done on this algorithm, you have been assuming the Sanskrit words are spelled with the HK transliteration. So stick with HK for char tables, etc.

One suggestion on code organization:

Make a separate spellgen.js (or whatever name you think best) file and have this file contain all the code for generating the 'res' array of spelling possibilities. That way, it will be easier for others to follow the algorithm logic without the distraction of the UI logic.

funderburkjim commented 7 years ago

re where should test repository be

Another thought -- put 'sanskrit-simple-search' as a folder within 'sanskrit-lexicon.github.io' repository. Maybe this would be the most convenient, since all interested parties would have access. Then the url for test code would be

http://sanskrit-lexicon.github.io/sanskrit-simple-search/fetching.html
gasyoun commented 7 years ago

In my fetching version I had not included word_frequency. However, it is now included.

Sure? So why is प्रण above प्राण in


prana
 Settings>>
prana NOT FOUND

pranA NOT FOUND

praMa NOT FOUND

praMA NOT FOUND

praNa FOUND

 प्रण

[L=25906] [p= 561]  .प्रण¦ mfn. (-णः-णा-णं) Old, ancient.
E. प्र substituted for पुराण, and न aff.
praNA NOT FOUND

praJa NOT FOUND

praJA NOT FOUND

praGa NOT FOUND

praGA NOT FOUND

prAna NOT FOUND

prAnA NOT FOUND

prAMa NOT FOUND

prAMA NOT FOUND

prANa FOUND

 प्राण

[L=27001] [p= 586]  .प्राण¦ mfn. (-णः-णा-णं) Full, replete, filled. m. (-णः)
1 Air inhaled, inspiration, breath.
2 Air, wind.
3 Life, vitality.
4 A vital organ or part.
5 Strength, power.
6 Myrrh.
7 Poetical talent or inspiration.
8 A name of BRAHMÁ.
9 A title of BRAHMÁ, the supreme spirit.
10 An aspiration in the articulation of letters. m. plu. (-णाः) The five vital airs or modes of inspiration and expiration collectively.
E. प्र before अन to breathe, aff. अच्।
prANA NOT FOUND

prAJa NOT FOUND

prAJA NOT FOUND

prAGa NOT FOUND

prAGA NOT FOUND

pRana NOT FOUND

pRanA NOT FOUND

pRaMa NOT FOUND

pRaMA NOT FOUND

pRaNa NOT FOUND

pRaNA NOT FOUND

pRaJa NOT FOUND

pRaJA NOT FOUND

pRaGa NOT FOUND

pRaGA NOT FOUND

pRAna NOT FOUND

pRAnA NOT FOUND

pRAMa NOT FOUND

pRAMA NOT FOUND

pRANa NOT FOUND

pRANA NOT FOUND

pRAJa NOT FOUND

pRAJA NOT FOUND

pRAGa NOT FOUND

pRAGA NOT FOUND

pRRana NOT FOUND

pRRanA NOT FOUND

pRRaMa NOT FOUND

pRRaMA NOT FOUND

pRRaNa NOT FOUND

pRRaNA NOT FOUND

pRRaJa NOT FOUND

pRRaJA NOT FOUND

pRRaGa NOT FOUND

pRRaGA NOT FOUND

pRRAna NOT FOUND

pRRAnA NOT FOUND

pRRAMa NOT FOUND

pRRAMA NOT FOUND

pRRANa NOT FOUND

pRRANA NOT FOUND

pRRAJa NOT FOUND

pRRAJA NOT FOUND

pRRAGa NOT FOUND

pRRAGA NOT FOUND

phrana NOT FOUND

phranA NOT FOUND

phraMa NOT FOUND

phraMA NOT FOUND

phraNa NOT FOUND

phraNA NOT FOUND

phraJa NOT FOUND

phraJA NOT FOUND

phraGa NOT FOUND

phraGA NOT FOUND

phrAna NOT FOUND

phrAnA NOT FOUND

phrAMa NOT FOUND

phrAMA NOT FOUND

phrANa NOT FOUND

phrANA NOT FOUND

phrAJa NOT FOUND

phrAJA NOT FOUND

phrAGa NOT FOUND

phrAGA NOT FOUND

phRana NOT FOUND

phRanA NOT FOUND

phRaMa NOT FOUND

phRaMA NOT FOUND

phRaNa NOT FOUND

phRaNA NOT FOUND

phRaJa NOT FOUND

phRaJA NOT FOUND

phRaGa NOT FOUND

phRaGA NOT FOUND

phRAna NOT FOUND

phRAnA NOT FOUND

phRAMa NOT FOUND

phRAMA NOT FOUND

phRANa NOT FOUND

phRANA NOT FOUND

phRAJa NOT FOUND

phRAJA NOT FOUND

phRAGa NOT FOUND

phRAGA NOT FOUND

phRRana NOT FOUND

phRRanA NOT FOUND

phRRaMa NOT FOUND

phRRaMA NOT FOUND

phRRaNa NOT FOUND

phRRaNA NOT FOUND

phRRaJa NOT FOUND

phRRaJA NOT FOUND

phRRaGa NOT FOUND

phRRaGA NOT FOUND

phRRAna NOT FOUND

phRRAnA NOT FOUND

phRRAMa NOT FOUND

phRRAMA NOT FOUND

phRRANa NOT FOUND

phRRANA NOT FOUND

phRRAJa NOT FOUND

phRRAJA NOT FOUND

phRRAGa NOT FOUND

phRRAGA NOT FOUND

So

you have been assuming the Sanskrit words are spelled with the HK transliteration

I agree, let HK be the basis.

funderburkjim commented 7 years ago

another thought re which transliteration to target

Although the HK and SLP1 transliterations are mostly equally good transliterations for Sanskrit, there are some advantages to SLP1.

So, if your algorithms are still at an early stage of program representation, it might make sense to switch to SLP1 as the target. But if your algorithms are already heavily dependent on HK, it probably makes sense to continue with HK.

funderburkjim commented 7 years ago

is word frequency used?

I am only stating that word_frequency.js is now present in my testing directory, and is loaded.

Whether my changes to the Javascript interfere with the usage of word_frequency, I don't know.

If the original code needs to parse the results from Cologne server before it uses the word_frequency information, then my changes would be causing a problem.

However, it would seem that the ordering of the keys by word_frequency could occur BEFORE the list of keys is sent to Cologne server.