Open funderburkjim opened 4 years ago
Here is a screenshot of the problem, on local XAMPP installation,
When the example search is initiated, a regular expression is constructed based on the various user choices of settings. Among other things, non-ascii characters of the input are removed; in our case the input is ऐश्वर्य , which has only non-ascii characters, so removing them leaves only the empty string. The result is that the regexp is [^a-zA-Z0-9]()[^a-zA-Z0-9]
. This regexp is used to search every line of query_dump. And every line in query_dump having (for example) two spaces would match. Result is that almost every line matches!
Then, since the user is requesting all matches, the number of matching headwords will be almost every headword in the dictionary -- e.g. for mw 200,000 or so.
Finally the program will generate html for all these headwords (probably several hundred megabytes of html) will attempt to be sent to the user's browser.
The first fix simply checks if, after removing non-ascii characters, the user input is the empty string. If so, the program immediately fails.
Incidentally, the reason for removing characters from the user's input is to attempt to guard against cross-side scripting attacks.
There are almost surely circumstances, in addition to those that the first fix addresses, that could occur in the advanced search where all or too many matches might occur.
The safest way to deal with this is simply to omit the 'all' option for the number of returned results. So, that is what the second fix does. Now, the maximum number of records returned is 1000. Surely, for almost all practical purposes 1000 records is ample.
As usual, the changes mentioned above were first made in local copy of csl-websanlexicon repository. Then tested on local server. Finally, the repository was pushed to Github, pulled to sanskrit-lexicon server. Then one dictionary was tested on Cologne server, and when all looked well, the changes were installed for all dictionaries.
The bug should be fixed now.
Here's a screen shot after the fix (using ap90 dictionary this time).
Now, the maximum number of records returned is 1000.
Bad news. So now we will not even know how much entries are found total.
You could do next
to get the next 1000.
next to get the next 1000.
That is pain. And I can't get stats at once.
can't get stats at once
True. Why don't you open an issue describing this as enhancement.
While we don't want to return all dictionary entries, it might be possible to return the total number of matches as a separate, new statistic, independent of the records returned.
What are some examples of searches you try that have more than 1000 matches?
try that have more than 1000 matches?
ja
as suffix.
The webmaster pointed out an issue with advanced search display.
To summarize, he mentioned that there are occasional times when abnormal cpu usage occurs, in conjunction with very long running requests.
He provided an example user url which caused such an event. It involved the Advanced Search.
After some examination, two changes were made in advanced search that (a) fix the specific example query, and (b) provide a limitation that should limit cpu usage for other advanced search queries that the specific fix doesn't handle. Details described in further comments.