sanskrit-lexicon / COLOGNE

Development of http://www.sanskrit-lexicon.uni-koeln.de/
18 stars 3 forks source link

Link to Exact Line, not Page #5

Closed Shalu411 closed 10 months ago

Shalu411 commented 10 years ago

http://www.sanskrit-lexicon.uni-koeln.de/scans/PWGScan/2013/web/webtc2/index.php

PWG Advance search>Sanskrit Word>Exact>"ziva">20output>12th entryभीम>

Now I would want to know where exactly here (in भीम) "ziva" occurs in that article. I have to scroll and manually find that word. It means- in each entry, where that word is found in the dictionary- those entries are given-- That's fine. But this word is not highlighted in those articles. Sometimes they are too long- and we cannot search with bare eyes for that little word. If it could be highlighted, then would help much more. Thankyou.

funderburkjim commented 10 years ago

The so-called Advanced Search is not particularly advanced. In fact, compared to full-strength search engines based on Lucene, it is quite primitive. However, Cologne does not hava a Java server, so implementing a search with Lucene is not possible - clearly such 'match location highlighting' as you suggest could be done with Lucene. Since Lucene is unavailable, this search feature would need to be implemented via php programs. Do you have any code suggestions? It is not clear to me how to accomplish the task.

gasyoun commented 10 years ago

I do not know how to implement the function. Can't http://stackoverflow.com/questions/2010663/lucene-with-php help?

funderburkjim commented 10 years ago

The SO replies suggest using SOLR. I have experimented with SOLR (on a laptop) and think it is an excellent way to go. HOWEVER, using SOLR requires the presence of a SOLR server, which in turn requires SYSADMIN installation. Further, SOLR is a Java application (like Lucene) and must be installed in a Java servlet 'container', such as Tomcat. Once, you have a SOLR server installed, you can interact with it in any programming language, since the interface is a 'restful' one, i.e. you send URLs with parameters to set up an index, query the index, etc. Since Cologne doesn't support Java servers, this is therefore not practical there, alas.

funderburkjim commented 10 years ago

Here is a trick that can provide some interim help with this problem. Take your example of advanced search in pwg 2013 for records with exact match of Sanskrit word with HK spelling 'ziva'. Trick # 1: Set your output to Roman Unicode and start the search. Trick # 2: Note that in Roman unicode, the search word is śiva ; Now, do a find in the browser for śiva (or just siva). and Voila! the browser has highlighted all the 'siva' occurrences on the page. If you click on the 12th word bhīma, then the browser highlighting immediately takes your eye to b) N. des Rudra ĀÇV. GṚHJ. 4, 8, 19. = śiva (with śiva highlighted). This experiment was run with the Chrome browser. I hope this trick might be of some practical use to you.

Shalu411 commented 10 years ago

Yes. Sure. I generally use browser "Find" option a lot. And since the "Advanced" search gives all output in one window; all words whatever number they be, do appear in one place. I always see output in Unicode Devanagari only. Never tried searching with Roman diacritics though.Trick is surely helping. Thankyou.

drdhaval2785 commented 9 years ago

@funderburkjim Rather than looking for a server side approach, can we try browser side like javascript http://www.the-art-of-web.com/javascript/search-highlight/ would be of some use?

gasyoun commented 9 years ago

@drdhaval2785 seems interesting, the more it has even a Patch for accented characters.

funderburkjim commented 9 years ago

These issues (5,8,9,10,11) will have to remain open.
They are too complicated for me to consider now. :disappointed:

drdhaval2785 commented 9 years ago

@funderburkjim, @Shalu411 and @gasyoun How do you like http://sanskrit-lexicon.github.io/cologne/highlighter/index.html ?

It highlights MW entry of Siva.

Uses http://www.the-art-of-web.com/javascript/search-highlight/ - hilitor-utf8.js to be precise.

Only lines added in the display HTML are

    <script type="text/javascript" src="hilitor-utf8.js"></script>
    <script type="text/javascript">

      var myHilitor2;
      document.addEventListener("DOMContentLoaded", function() {
        myHilitor2 = new Hilitor2("data");
        myHilitor2.setMatchType("left");
        myHilitor2.apply("Siva");
      }, false);

    </script>

What I would want from @funderburkjim is Give me a string of keyword searched instead of "Siva" in myHilitor2.apply("Siva"); line. Preferrably it should be converted to the encoding in which the user wants his output e.g. SLP1, HK, IAST etc.

drdhaval2785 commented 9 years ago

I guess this should be easily doable by PHP generating the page.

gasyoun commented 9 years ago

I guess it's a new issue. @Shalu411 was related to visual picture. Yours is textual only.

funderburkjim commented 9 years ago

I like the Siva example!

From my reading of @Shalu411's suggestion, this addresses the issue she raised.

@drdhaval2785 - do you have this implemented locally in mw/web/webtc2 ?

drdhaval2785 commented 9 years ago

@funderburkjim No, I have not implemented in mw/web/webtc2. This is just addition to a locally saved copy of the webpage. Added a .js file and added https://github.com/sanskrit-lexicon/Cologne/issues/5#issuecomment-159498202 script to head of the html.

I could not understand the PHPs used for display of webpage. So didn't venture.

drdhaval2785 commented 8 years ago

@funderburkjim Time to implement this functionality. Would be a great additionality to our repertoire.

gasyoun commented 8 years ago

Time to implement this functionality.

Or to add an OCR layer to the files and highlight the word found (but not strict correspondence, even partly would be enough.) But devanagari only files are too big for Oliver's OCR, will not manage them. 100+ pages is too big for Sanskrit OCR software.

funderburkjim commented 8 years ago

This again is a case where it would be better to use existing search engine technology than reinvent the wheel. I think this facility is called highlighting, and Google searches for 'Elasticsearch highlight' will bring up references.

I have set up an ECS instance of the Bitnami elastic search, with the intention of sometime trying to fit one or two of our dictionaries into that search-engine framework. It is also easy to install an elasticsearch instance on a local computer - I've done this and played around at a very preliminary level using the book ElasticSearch in Action for learning.

Maybe it's time to get more serious with this approach, and see if my intuition pans out regarding utility of applying search-engine technology to our collection of dictionaries.

gasyoun commented 5 years ago

@artforlife ever dealt with highlighting?

drdhaval2785 commented 3 years ago

Text highlight and PDF higight still elude us I guess. Right, @funderburkjim ?

funderburkjim commented 3 years ago

I must have missed this posting.

I like the http://sanskrit-lexicon.github.io/cologne/highlighter/index.html

Is it feasible to implement this in advance search display?

drdhaval2785 commented 3 years ago

Yes, it seems feasible. It is just a .js file.

funderburkjim commented 3 years ago

OK -- I'll put it on nearterm list to investigate.

drdhaval2785 commented 3 years ago

This is the relevant part of code.

    <script type="text/javascript">

      var myHilitor2;
      document.addEventListener("DOMContentLoaded", function() {
        myHilitor2 = new Hilitor2("data");
        myHilitor2.setMatchType("left");
        myHilitor2.apply("Siva");
      }, false);

    </script>
drdhaval2785 commented 3 years ago

Instead of "Siva" which is hardcoded as of now, we would have to fill the searched word.

gasyoun commented 3 years ago

nearterm list to investigate.

Thanks, it's one of the top 5 UI issues, agree with Dhaval.

drdhaval2785 commented 3 years ago

355 highlights all occurrence of searched word.

drdhaval2785 commented 2 years ago

Can this issue be taken up @funderburkjim ? It is pending for 7 years, and would be a good enhancement. It also has a working example mentioned in http://sanskrit-lexicon.github.io/cologne/highlighter/index.html .

gasyoun commented 11 months ago

@artanat please share the code we use at https://samskrtam.ru/parallel-corpus/ for @funderburkjim

artanat commented 11 months ago

@artanat please share the code we use at https://samskrtam.ru/parallel-corpus/ for @funderburkjim

let currentIndex = -1;
const nextButton = document.getElementById("nextButton");
const prevButton = document.getElementById("prevButton");
const nearestButton = document.getElementById("nearestButton");

nextButton.addEventListener("click", function() {
  navigateToNextHighlight();
});

nearestButton.addEventListener("click", function() {
  activateNearestVisibleHighlight();
});

prevButton.addEventListener("click", function() {
  navigateToPreviousHighlight();
});

document.addEventListener("keydown", function(event) {
  if (event.keyCode === 65) { // Код клавиши "a"
    navigateToPreviousHighlight();
    event.preventDefault(); // Предотвращаем стандартное действие браузера
  } else if (event.keyCode === 83) { // Код клавиши "s"
    activateNearestVisibleHighlight();
    event.preventDefault(); // Предотвращаем стандартное действие браузера
  } else if (event.keyCode === 68) { // Код клавиши "d"
    navigateToNextHighlight();
    event.preventDefault(); // Предотвращаем стандартное действие браузера
  }
});

function navigateToPreviousHighlight() {
  const highlightElements = document.querySelectorAll(".highlight");

  if (currentIndex !== -1) {
    highlightElements[currentIndex].classList.remove("active");
  }

  currentIndex = (currentIndex - 1 + highlightElements.length) % highlightElements.length;
  highlightElements[currentIndex].classList.add("active");
  highlightElements[currentIndex].scrollIntoView({ behavior: "smooth" });
}

function navigateToNextHighlight() {
  const highlightElements = document.querySelectorAll(".highlight");

  if (currentIndex !== -1) {
    highlightElements[currentIndex].classList.remove("active");
  }

  currentIndex = (currentIndex + 1) % highlightElements.length;
  highlightElements[currentIndex].classList.add("active");
  highlightElements[currentIndex].scrollIntoView({ behavior: "smooth" });
}

function activateNearestVisibleHighlight() {
  const highlightElements = document.querySelectorAll(".highlight");

  highlightElements.forEach(element => element.classList.remove("active"));

  let minDistance = Number.MAX_SAFE_INTEGER;
  let nearestIndex = -1;

  highlightElements.forEach((element, index) => {
    const rect = element.getBoundingClientRect();
    const distance = Math.abs(rect.top);

    if (distance < minDistance) {
      minDistance = distance;
      nearestIndex = index;
    }
  });

  if (nearestIndex !== -1) {
    currentIndex = nearestIndex;
    highlightElements[currentIndex].classList.add("active");
    highlightElements[currentIndex].scrollIntoView({ behavior: "smooth" });
  }
}
funderburkjim commented 11 months ago

@artanat Hi, Anatoly. Thanks for the JS.

Could you provide a stand-alone html demo?

funderburkjim commented 11 months ago

@artanat I see that in Dhaval's comment above there is an example. Maybe that's the demo I was looking for. Let me look at that first.

funderburkjim commented 11 months ago

@drdhaval2785 Your highlighter demo url http://sanskrit-lexicon.github.io/cologne/highlighter/index.html works.

But where is the code? I don't find it at https://github.com/sanskrit-lexicon/COLOGNE/.

funderburkjim commented 10 months ago

mdhl1: first use of highlighter

https://sanskrit-lexicon.uni-koeln.de/work/hilitdev/mdhl1/web/webtc2/index.php

This is an adaptation of the current normal display of md, using hilitor-utf8.js from Dhaval's demo.

It partially works - e.g. for text word 'exact' or 'prefix', and if search term contains only normal alphabet (no diacritics, no devanagari). Suffix, infix,substring don't work (no highlighting)

'Sanskrit word' search works (with exact/prefix) if both input and output are slp1 or HK.

The way it works is (roughly):

probable next step

modify the js to get rid of some or all of the limitations mentioned above.

Request others to experiment and to provide feedback on what are the most important types of searches that would benefit from highlighting.

I suspect that some of the limitations are unavoidable in this method. (e.g. searching for an slp1 input, but specifying a "Devanagari" output).

artanat commented 10 months ago

@artanat I see that in Dhaval's comment above there is an example. Maybe that's the demo I was looking for. Let me look at that first.

https://samskrtam.ru/parallel-corpus/s/arjuna-617.html

gasyoun commented 10 months ago

dffdsfdsfds

Looks promising, Jim! Anatoliy's code additionaly can go from one place to another.

funderburkjim commented 10 months ago

demo based on @artanat

This demo also local to md dictionary.

https://sanskrit-lexicon.uni-koeln.de/work/hilitdev/md_artanat/web/webtc2/index.php

More robust than mdhl1 above.

@artanat What does the '=' button do? Is it functioning properly here? Is it needed in this application?


Request feedback. There are some searches where hi-lighting fails, but I think these are acceptable imperfections.

What do others think?
Should we accept this version for all cdsl dictionaries?

Note: this highlighting is only in Advanced Search.


Sample:

image
artanat commented 10 months ago

@artanat What does the '=' button do? Is it functioning properly here? Is it needed in this application? '=' selects the closest position when scrolling a document

funderburkjim commented 10 months ago

@gasyoun @artanat

Request you to provide feedback on the version https://sanskrit-lexicon.uni-koeln.de/work/hilitdev/md_artanat/web/webtc2/index.php mentioned above.

Is it ready for general deployment?

gasyoun commented 10 months ago

Is it ready for general deployment?

Yes, it works. As expected. dsddsdsa

funderburkjim commented 10 months ago

@gasyoun Thanks for feedback. @Shalu411 is also reviewing an skd-version.

funderburkjim commented 10 months ago

deployed

The highlighting code (based on @artanat code) now pushed to github. The commit link above shows the changes made to the csl-websanlexicon code.

The change is deployed in all the advanced search displays at Cologne, by running the redo_cologne_all.sh script in csl-websanlexicon/v02.

drdhaval2785 commented 10 months ago

Hurray. One of the oldest requests coming to a closure.