Closed Shalu411 closed 10 months ago
The so-called Advanced Search is not particularly advanced. In fact, compared to full-strength search engines based on Lucene, it is quite primitive. However, Cologne does not hava a Java server, so implementing a search with Lucene is not possible - clearly such 'match location highlighting' as you suggest could be done with Lucene. Since Lucene is unavailable, this search feature would need to be implemented via php programs. Do you have any code suggestions? It is not clear to me how to accomplish the task.
I do not know how to implement the function. Can't http://stackoverflow.com/questions/2010663/lucene-with-php help?
The SO replies suggest using SOLR. I have experimented with SOLR (on a laptop) and think it is an excellent way to go. HOWEVER, using SOLR requires the presence of a SOLR server, which in turn requires SYSADMIN installation. Further, SOLR is a Java application (like Lucene) and must be installed in a Java servlet 'container', such as Tomcat. Once, you have a SOLR server installed, you can interact with it in any programming language, since the interface is a 'restful' one, i.e. you send URLs with parameters to set up an index, query the index, etc. Since Cologne doesn't support Java servers, this is therefore not practical there, alas.
Here is a trick that can provide some interim help with this problem. Take your example of advanced search in pwg 2013 for records with exact match of Sanskrit word with HK spelling 'ziva'. Trick # 1: Set your output to Roman Unicode and start the search. Trick # 2: Note that in Roman unicode, the search word is śiva ; Now, do a find in the browser for śiva (or just siva). and Voila! the browser has highlighted all the 'siva' occurrences on the page. If you click on the 12th word bhīma, then the browser highlighting immediately takes your eye to b) N. des Rudra ĀÇV. GṚHJ. 4, 8, 19. = śiva (with śiva highlighted). This experiment was run with the Chrome browser. I hope this trick might be of some practical use to you.
Yes. Sure. I generally use browser "Find" option a lot. And since the "Advanced" search gives all output in one window; all words whatever number they be, do appear in one place. I always see output in Unicode Devanagari only. Never tried searching with Roman diacritics though.Trick is surely helping. Thankyou.
@funderburkjim Rather than looking for a server side approach, can we try browser side like javascript http://www.the-art-of-web.com/javascript/search-highlight/ would be of some use?
@drdhaval2785 seems interesting, the more it has even a Patch for accented characters
.
These issues (5,8,9,10,11) will have to remain open.
They are too complicated for me to consider now. :disappointed:
@funderburkjim, @Shalu411 and @gasyoun How do you like http://sanskrit-lexicon.github.io/cologne/highlighter/index.html ?
It highlights MW entry of Siva.
Uses http://www.the-art-of-web.com/javascript/search-highlight/ - hilitor-utf8.js to be precise.
Only lines added in the display HTML are
<script type="text/javascript" src="hilitor-utf8.js"></script>
<script type="text/javascript">
var myHilitor2;
document.addEventListener("DOMContentLoaded", function() {
myHilitor2 = new Hilitor2("data");
myHilitor2.setMatchType("left");
myHilitor2.apply("Siva");
}, false);
</script>
What I would want from @funderburkjim is
Give me a string of keyword searched instead of "Siva" in myHilitor2.apply("Siva");
line.
Preferrably it should be converted to the encoding in which the user wants his output e.g. SLP1, HK, IAST etc.
I guess this should be easily doable by PHP generating the page.
I guess it's a new issue. @Shalu411 was related to visual picture. Yours is textual only.
I like the Siva example!
From my reading of @Shalu411's suggestion, this addresses the issue she raised.
@drdhaval2785 - do you have this implemented locally in mw/web/webtc2 ?
@funderburkjim No, I have not implemented in mw/web/webtc2. This is just addition to a locally saved copy of the webpage. Added a .js file and added https://github.com/sanskrit-lexicon/Cologne/issues/5#issuecomment-159498202 script to head of the html.
I could not understand the PHPs used for display of webpage. So didn't venture.
@funderburkjim Time to implement this functionality. Would be a great additionality to our repertoire.
Time to implement this functionality.
Or to add an OCR layer to the files and highlight the word found (but not strict correspondence, even partly would be enough.) But devanagari only files are too big for Oliver's OCR, will not manage them. 100+ pages is too big for Sanskrit OCR software.
This again is a case where it would be better to use existing search engine technology than reinvent the
wheel. I think this facility is called highlighting
, and Google searches for 'Elasticsearch highlight' will
bring up references.
I have set up an ECS instance of the Bitnami elastic search, with the intention of sometime trying to
fit one or two of our dictionaries into that search-engine framework. It is also easy to install an elasticsearch instance on a local computer - I've done this and played around at a very preliminary level
using the book ElasticSearch in Action
for learning.
Maybe it's time to get more serious with this approach, and see if my intuition pans out regarding utility of applying search-engine technology to our collection of dictionaries.
@artforlife ever dealt with highlighting?
Text highlight and PDF higight still elude us I guess. Right, @funderburkjim ?
I must have missed this posting.
I like the http://sanskrit-lexicon.github.io/cologne/highlighter/index.html
Is it feasible to implement this in advance search display?
Yes, it seems feasible. It is just a .js file.
OK -- I'll put it on nearterm list to investigate.
This is the relevant part of code.
<script type="text/javascript">
var myHilitor2;
document.addEventListener("DOMContentLoaded", function() {
myHilitor2 = new Hilitor2("data");
myHilitor2.setMatchType("left");
myHilitor2.apply("Siva");
}, false);
</script>
Instead of "Siva" which is hardcoded as of now, we would have to fill the searched word.
nearterm list to investigate.
Thanks, it's one of the top 5 UI issues, agree with Dhaval.
Can this issue be taken up @funderburkjim ? It is pending for 7 years, and would be a good enhancement. It also has a working example mentioned in http://sanskrit-lexicon.github.io/cologne/highlighter/index.html .
@artanat please share the code we use at https://samskrtam.ru/parallel-corpus/ for @funderburkjim
@artanat please share the code we use at https://samskrtam.ru/parallel-corpus/ for @funderburkjim
let currentIndex = -1;
const nextButton = document.getElementById("nextButton");
const prevButton = document.getElementById("prevButton");
const nearestButton = document.getElementById("nearestButton");
nextButton.addEventListener("click", function() {
navigateToNextHighlight();
});
nearestButton.addEventListener("click", function() {
activateNearestVisibleHighlight();
});
prevButton.addEventListener("click", function() {
navigateToPreviousHighlight();
});
document.addEventListener("keydown", function(event) {
if (event.keyCode === 65) { // Код клавиши "a"
navigateToPreviousHighlight();
event.preventDefault(); // Предотвращаем стандартное действие браузера
} else if (event.keyCode === 83) { // Код клавиши "s"
activateNearestVisibleHighlight();
event.preventDefault(); // Предотвращаем стандартное действие браузера
} else if (event.keyCode === 68) { // Код клавиши "d"
navigateToNextHighlight();
event.preventDefault(); // Предотвращаем стандартное действие браузера
}
});
function navigateToPreviousHighlight() {
const highlightElements = document.querySelectorAll(".highlight");
if (currentIndex !== -1) {
highlightElements[currentIndex].classList.remove("active");
}
currentIndex = (currentIndex - 1 + highlightElements.length) % highlightElements.length;
highlightElements[currentIndex].classList.add("active");
highlightElements[currentIndex].scrollIntoView({ behavior: "smooth" });
}
function navigateToNextHighlight() {
const highlightElements = document.querySelectorAll(".highlight");
if (currentIndex !== -1) {
highlightElements[currentIndex].classList.remove("active");
}
currentIndex = (currentIndex + 1) % highlightElements.length;
highlightElements[currentIndex].classList.add("active");
highlightElements[currentIndex].scrollIntoView({ behavior: "smooth" });
}
function activateNearestVisibleHighlight() {
const highlightElements = document.querySelectorAll(".highlight");
highlightElements.forEach(element => element.classList.remove("active"));
let minDistance = Number.MAX_SAFE_INTEGER;
let nearestIndex = -1;
highlightElements.forEach((element, index) => {
const rect = element.getBoundingClientRect();
const distance = Math.abs(rect.top);
if (distance < minDistance) {
minDistance = distance;
nearestIndex = index;
}
});
if (nearestIndex !== -1) {
currentIndex = nearestIndex;
highlightElements[currentIndex].classList.add("active");
highlightElements[currentIndex].scrollIntoView({ behavior: "smooth" });
}
}
@artanat Hi, Anatoly. Thanks for the JS.
Could you provide a stand-alone html demo?
@artanat I see that in Dhaval's comment above there is an example. Maybe that's the demo I was looking for. Let me look at that first.
@drdhaval2785 Your highlighter demo url http://sanskrit-lexicon.github.io/cologne/highlighter/index.html works.
But where is the code? I don't find it at https://github.com/sanskrit-lexicon/COLOGNE/.
https://sanskrit-lexicon.uni-koeln.de/work/hilitdev/mdhl1/web/webtc2/index.php
This is an adaptation of the current normal display of md, using hilitor-utf8.js from Dhaval's demo.
It partially works - e.g. for text word 'exact' or 'prefix', and if search term contains only normal alphabet (no diacritics, no devanagari). Suffix, infix,substring don't work (no highlighting)
'Sanskrit word' search works (with exact/prefix) if both input and output are slp1 or HK.
The way it works is (roughly):
modify the js to get rid of some or all of the limitations mentioned above.
Request others to experiment and to provide feedback on what are the most important types of searches that would benefit from highlighting.
I suspect that some of the limitations are unavoidable in this method. (e.g. searching for an slp1 input, but specifying a "Devanagari" output).
@artanat I see that in Dhaval's comment above there is an example. Maybe that's the demo I was looking for. Let me look at that first.
Looks promising, Jim! Anatoliy's code additionaly can go from one place to another.
This demo also local to md dictionary.
https://sanskrit-lexicon.uni-koeln.de/work/hilitdev/md_artanat/web/webtc2/index.php
More robust than mdhl1 above.
@artanat What does the '=' button do? Is it functioning properly here? Is it needed in this application?
Request feedback. There are some searches where hi-lighting fails, but I think these are acceptable imperfections.
What do others think?
Should we accept this version for all cdsl dictionaries?
Note: this highlighting is only in Advanced Search.
Sample:
@artanat What does the '=' button do? Is it functioning properly here? Is it needed in this application? '=' selects the closest position when scrolling a document
@gasyoun @artanat
Request you to provide feedback on the version https://sanskrit-lexicon.uni-koeln.de/work/hilitdev/md_artanat/web/webtc2/index.php mentioned above.
Is it ready for general deployment?
Is it ready for general deployment?
Yes, it works. As expected.
@gasyoun Thanks for feedback. @Shalu411 is also reviewing an skd-version.
The highlighting code (based on @artanat code) now pushed to github. The commit link above shows the changes made to the csl-websanlexicon code.
The change is deployed in all the advanced search displays at Cologne, by running the redo_cologne_all.sh script in csl-websanlexicon/v02.
Hurray. One of the oldest requests coming to a closure.
http://www.sanskrit-lexicon.uni-koeln.de/scans/PWGScan/2013/web/webtc2/index.php
PWG Advance search>Sanskrit Word>Exact>"ziva">20output>12th entryभीम>
Now I would want to know where exactly here (in भीम) "ziva" occurs in that article. I have to scroll and manually find that word. It means- in each entry, where that word is found in the dictionary- those entries are given-- That's fine. But this word is not highlighted in those articles. Sometimes they are too long- and we cannot search with bare eyes for that little word. If it could be highlighted, then would help much more. Thankyou.