sanskrit-lexicon / COLOGNE

Development of http://www.sanskrit-lexicon.uni-koeln.de/
18 stars 3 forks source link

A demo of restful API with flask-restplus #290

Open drdhaval2785 opened 4 years ago

drdhaval2785 commented 4 years ago

@funderburkjim and @gasyoun Please find attached a screencast of demo of flask-restplus APIs

Currently I have implemented three APIs

  1. /v0.0.1/dicts/{dict}/hw/{hw} - Return a list of headword, key2, pc and text for given headword of given dictionary.
  2. /v0.0.1/dicts/{dict}/lnum/{lnum} - Return headword, key2, pc and text for given lnum of given dictionary.
  3. /v0.0.1/dicts/{dict}/regex/{reg} - Return a list of headword, key2, pc and text for headwords which match the given regular expression of given dictionary.

APIs 1 and 2 can be used for searches based on headword or lnum. API 3 can substitute the logic of 'prefix', 'suffix', 'substring' and 'exact' from the current frontend.

api_trial1.mp4.zip

Kindly let me know your feedback regarding this process.

Flask-restplus and Flask require python2.7 or python3.5. I guess it should not be a problem on Cologne server as well. Once the local development is OK, we can upload it on server.

gasyoun commented 4 years ago

Please find attached a screencast of demo of flask-restplus APIs

Can it be true? Amazing. Even regex! I would only make the URLs as short as possible. reg instead of regex, d instead of dicts or without it at all. Otherwise a dream come true.

drdhaval2785 commented 4 years ago

https://restfulapi.net/resource-naming/ mentions that it is ideal to mention plurals for correction and singulars following them for individual resources. API should be self explanatory. d is not very explanatory.

gasyoun commented 4 years ago

https://restfulapi.net/resource-naming/ mentions

Use lowercase letters in URIs - but we will make no use of this recommendation, because we plan to use SLP1 in URLs, right? So for something, that does not change, I want to be as short as possible. I would not want to brake the URLs in Skype or Facebook if possible. Every letter counts. I would exclude the dicts at all, if you ask me.

funderburkjim commented 4 years ago

screencast of demo of flask-restplus APIs

Not able (Windows 10, or MacOS) to view screencast. Error 0xc1010103 on Windows.

Was able to view when opening with Chrome.

funderburkjim commented 4 years ago

should not be a problem on Cologne server

Not clear. Ask webmaster regarding possibility of flask installation at Cologne.

I know it is possible to use the '.cgi' (common-gateway-interface) to have a browser run Python code on server. But not sure about flask.

funderburkjim commented 4 years ago

regex dangerous

It is my impression that allowing users to submit regular expressions is a security risk.

user enters 'pseudo' regex.

However, in our situation we can use treat the user input as a 'pseudo-regex', with a very restricted syntax, and then from this restricted syntax safely construct a real regex that actually does the work. For instance, we could require that the regex consist only of alphabetical characters, along with asterisk (0-or more alphabetical characters) and '+' (1 or more alphabetical characters). Then if X is a sequence of only alphabetical characters:

The idea is to provide strong restrictions on the user input, while allowing considerable flexibility, but carefully constructing regexes from the restricted user input, rather than depending on a regex library to parse the direct user input. Something similar to this regex construction is what is done in advanced search.

Is there a 'search-engine-standard' ?

Clearly, search engines allow quite a variety of user inputs, and surely there are standard approaches to deal with safe handling of user input. Maybe there is a useful python library for generating safe regex strings from random user input.

pagination needed

There is also a need to prevent, at least on Cologne server, a user from getting too much data back. For instance '*' and '+' would retrieve everything, in effect downloading the entire dictionary xml file.

This can be prevented by including 'pagination' logic. In Advanced Search, pagination is made possible because:

Something similar, but different, is likely possible when querying sqlite files; but I don't know the details.

drdhaval2785 commented 4 years ago

Asked webmaster at cologne. They have flask for web server. They are willing to add it to dialog server after christmas vacation.

Your concerns about regex are well founded and well taken. Will put some kind of throttling to save server from extreme loads / security threats.

funderburkjim commented 4 years ago

They have flask for web server

Good news. I'm surprised.
They may need to install the extra flask modules you are using.
Also -- what about the python version requirements? Via ssh, we have neither 2.7 nor 3.5 but 2.6.6 and 3.4. Maybe the web server version has newer python.

Will put some kind of throttling to save server from extreme loads / security threats

Agree these are manageable concerns.

gasyoun commented 4 years ago

Your concerns about regex are well founded and well taken. Will put some kind of throttling to save server from extreme loads / security threats.

It's true. Let's ask if @YevgenJohn has an idea how to secure the server. Or allow it at least on the local version for PCs.

They are willing to add it to dialog server after christmas vacation.

Great news.

See http://www.sanskrit-linguistics.org/dcs/index.php?contents=help_query and his Wildcard approach.

Wildcard | Meaning | Examples
-- | -- | --
* | No letter, one letter or more than one letter | r*j:rañj, rāj, ruj and ratnarāj*vi*rama:amitavikrama, vikrama, savibhrama
? | Exactly one letter | r?j:rāj, ruj ?vi?rama:avikrama
funderburkjim commented 4 years ago

We could use the '*' and '?' symbols as indicated. This use of these two symbols is part of the glob syntax .

We might also want a '+' wildcard, to indicate one letter or more than one letter (almost same as '*') -- Using '+' allows some distinctions, but not sure if it matters much.

Important to restrict the meaning of 'letter' (no letter, one letter, or more than one letter).

gasyoun commented 4 years ago

replace-as-RegExp implementation - yeah, makes sense, Jim.

Important to restrict the meaning of 'letter' (no letter, one letter, or more than one letter).

Agree. Not so many cases actually required.