Pdf occur extended mode - Context resize

vedang / pdf-tools

Emacs support library for PDF files.

https://pdftools.wiki

GNU General Public License v3.0

615 stars 89 forks source link

Pdf occur extended mode - Context resize #15

Open Twix53791 opened 3 years ago

Twix53791 commented 3 years ago

Hello,

Pdf occur is a wonderful tool, but it will be even more amazing if it was possible to resize the surrounding context displayed by the occur search. I am a scholar, and I see two good reasons justifying this feature : 1) Sometimes, it is useful to get a larger context to assess if an occurrence is relevant or not, especially if the word searched is located at the far end or at the start of the line picked by the occur search. 2) A such feature works like a trick allowing a useful proximity search : as it is possible to search into the search results buffer, it is possible to refine the first search with another word, getting as a final result the locations where both words are close together. Right now, we can already perform this use in pdf occur, but with only one line of context, it is poorly useful. With 2-3 lines more, it becomes really valuable.

Pdf occur works very differently thand the emacs occur search, but to give an exemple of what tool I am thinking about, there is this very efficient tool for occur : https://github.com/dgtized/occur-context-resize.el.

Anyway, just an idea! Thanks

vedang commented 3 years ago

Copying my comment from politza/pdf-tools#671, just for completeness sake:

Hello @Twix53791 ,

Unfortunately, this is non-trivial in pdf-occur due to the fact that the underlying text is returned by poppler and requires changes in the epdfinfo server (C) program. I will track this as a feature request and pick it up when time permits.

Thanks, Vedang

tongjie-chen commented 3 years ago

Maybe this is not that impossible. This feature needs to find the region of text and then use that location to parse text before and after. Or make an image preview if it includes equation in any format.

tongjie-chen commented 3 years ago

Modify the function and now you can get the enlarged context.

(setq pdf-occur-need-enlarge t)
(defun pdf-occur-add-matches (filename matches)
  (pdf-occur-assert-occur-buffer-p)
  (when matches
    (let (entries)
      (dolist (match matches)
        (let-alist match
      ;; For more context, enlarge the .edges and get text from the enlarged .edges
          (push (pdf-occur-create-entry filename .page
                    (cons
                     (pdf-info-gettext .page (list 0 (max 0 (- 0.1 (elt (car .edges) 1))) 1 (min 1 (+ 0.1 (elt (car .edges) 3)))) 'word filename)
                     .edges))
                entries)))
      (setq entries (nreverse entries))
      (pdf-occur-insert-entries entries))))

Would be better with a customized switch like pdf-occur-need-enlarge, and a variable of how much to enlarge to replace the 0.1 in this function.

(defun pdf-occur-add-matches (filename matches)
  (pdf-occur-assert-occur-buffer-p)
  (setq pdf-occur-need-enlarge nil)
  (setq pdf-occur-enlarge-extent 0.1)
  (when matches
    (let (entries)
      (dolist (match matches)
        (let-alist match
      ;; For more context, enlarge the .edges and get text from the enlarged .edges
          (push (pdf-occur-create-entry filename .page
                    (cons
                     (if pdf-occur-need-enlarge 
                         (pdf-info-gettext .page (list 0 (max 0 (- pdf-occur-enlarge-extent (elt (car .edges) 1))) 1 (min 1 (+ pdf-occur-enlarge-extent (elt (car .edges) 3)))) 'word filename)
                       .text)
                     .edges))
                entries)))
      (setq entries (nreverse entries))
      (pdf-occur-insert-entries entries))))

The issue now is that it does not highlight the selected text anymore.

tongjie-chen commented 3 years ago

For the highlight, the original .text includes '(start end (face-match)). Need to add that face match back to the newly extracted text.

Twix53791 commented 2 years ago

Hi, Thank you very much for this insight @tongjie-chen ! I would never have found on my own. I took your code and developed it a bit to fit my needs. I was a challenge to someone like me who is not a programmer. I took me a LOT of trials and errors to arrive at understand a bit the lisp language and do something functional. It is, but maybe not really opti... Anyway, to help the newbies like me who would like implement this useful features, I share here what I did

Twix53791 commented 2 years ago

FIRST Some explanations

So, lets just make clear the way this feature works for those like me who got nothing at the start...

The pdf-info-gettext function take as arguments 1) a page (number) 2) four parameters defining an area (list) 3) a type a searching ('word, 'line... cf. the function itself to see) 4) a file. It search, in the file specified, at the page, the words (or lines or...) underneath an area define by the second argument. This second argument is crucial for the purpose of our feature, as it can be enlarge and thus we can select a larger part of text. This area is define by four numbers, which define a rectangle, each one specifying a direction : LEFT, TOP, RIGHT, BOTTOM. Each parameter have to be set between 0 and 1 and there are relative to the page. It means than, for example, the setting 0,0,1,1 select the entire page : the corner top;left at 0:0 and the bottom.right corner at 1:1. But, it has to be understand not like a rectangle of selection as we can do on an image, but rather as a selection we do on text. For example, let's imagine the coordinate of the following word is [0.43;0.35;0.46;0.367] and we select from it the current area, in bold, with the coordinates [0.43;0.35;0.91;0.45], this one start at the point 0.43;0.35, to select de text until the point 0.91;0.45 here**. As we see, the term "here" is actually about 91% right on the page, as "word" was about at 43% (0.43). Again, a word at the middle high of the page will have Y coordinates around 0.5.

The native pdf-occur search grab already the coordinates of the matches, in the (cons .text .edges) we modify, there is these coordinates. So it become possible to extend the selection range relatively and centered on each match of the search, as @tongjie-chen grasped it.

The simple setting

We use the code of @tongjie-chen which extend the selection from the coordinates of the match by [range] :

          (push (pdf-occur-create-entry filename .page
                    (cons
                     (pdf-info-gettext .page (list 0 (max 0 (- pdf-occur-range (elt (car .edges) 1))) 1 (min 1 (+ pdf-occur-range (elt (car .edges) 3)))) 'word filename)
                     .edges))
                entries)))

And so we only need to set the range.

I use a keyboard shortcut which allow me to easily set the value of this range. There is many way to do that ; here my way, a bit laborious, inspired by the occur-resize addon for emacs https://github.com/dgtized/occur-context-resize.el. :

(defvar pdf-occur-range 0) ;; the default extended range is 0
(defvar pdf-occur-range-values '(0 0.02 0.05 0.1 0.15 0.2 0.25 0.3 0.4 0.5 0.75 1)) ;; a set of predefined values
(defvar n 0)
(defun pdf-occur-range-larger ()
    (interactive)
    (setq n (min (- (length pdf-occur-range-values) 1) (1+ n)))
    (setq pdf-occur-range (nth n pdf-occur-range-values))
   (message "Pdf-occur-search-range : %s" pdf-occur-range))
(defun pdf-occur-range-smaller ()
    (interactive)
    (setq n (max 0 (1- n)))
    (setq pdf-occur-range (nth n pdf-occur-range-values))
    (message "Pdf-occur-search-range : %s" pdf-occur-range))
(defun pdf-occur-range-default ()
    (interactive)
    (setq pdf-occur-range 0)
    (message "Pdf-occur-search-range set to %s" pdf-occur-range))
)
(define-key pdf-view-mode-map (kbd "C--") 'pdf-occur-range-smaller)
(define-key pdf-view-mode-map (kbd "C-+") 'pdf-occur-range-larger)
(define-key pdf-view-mode-map (kbd "C-0") 'pdf-occur-range-default)

So now I can easily set the range of my search with Ctrl+ and Ctrl-. I put the code in my init.el file, in the section (use-package pdf-tools... where I have got my personal config for emacs.

To end this setting, it is useful to be able to still use the native occur search of pdf-tools. We can define thant when pdf-occur-range is set to 0, it use the native search function, which is far faster than the pdf-info-gettext function. The final core :

          (cond  
            ((= pdf-occur-range 0)
              (push (pdf-occur-create-entry filename .page (cons .text .edges)) entries))
            (t  
                        (push (pdf-occur-create-entry filename .page
                    (cons
                       (pdf-info-gettext .page (list 0 (max 0 (- pdf-occur-range (elt (car .edges) 1))) 1 (min 1 (+ pdf-occur-range (elt (car .edges) 3)))) 'word filename)
                     .edges))      entries)))`

We have here something functional, but a bit to raw for me ! Indeed, first it cancel the highlight, but also, it is limited to extend on one page, which leads to weird configurations : a match at the last line of the page can be extend to the top, but not the bottom, which is "unbalanced" and leads to biases in the search. Also, the areas of selected text from two close matches cover each other, which can leads to a LOT of garbage text (when the range of the extension is large), many duplicates, and a very, very slow search, like more than one minute for 1000+ occurrences in a text. SO... I left in quest of the perfect function resolving all these issues... and it was a long way !

Twix53791 commented 2 years ago

Here the code :

(defun pdf-occur-add-matches (filename matches)
  (pdf-occur-assert-occur-buffer-p)
  (when matches
  (let ((entries)
        (nthmatch 0)
        (nmatch)
        (pmatch)
        (range pdf-occur-range))
  (dolist (match matches)
  (let-alist match

    (cond  
      ((= pdf-occur-range 0)
        (push (pdf-occur-create-entry filename .page 
          (cons .text .edges)) 
            entries))
    (t  

;; ##########  Pdf-occur-extend search process ! 
;;it is reload at every loop (by dolist) so the values on the variables set here are reset every time, for every match of matches

      (setq-local truncate-lines nil)    
      (cl-incf nthmatch)

;; Setting the next match : page.top-position
      (cond ((= .page maxpages)         ;; maxpages is defined at pdf-occur-start-search
            (setf nmatch (cons .page 
              (- 1 botmargin))))
            ((< nthmatch (length matches))
              (let-alist (nth nthmatch matches)
                (setf nmatch (cons .page 
                  (elt (car .edges) 1)))))
      (t (setf nmatch (cons (+ .page 2) 1))))

;; t : to be sure there is always a value for nmatch, even when length 
;;matches = 1 or we arrive at the last occurrence ; in this two cases, 
;;nmatch values is set to assure than the algorithm will go to the "nmatch far" path

        (unless pmatch  ;; at the first match set pmatch = (cons page . edges bottom of SELECTION pmatch)
          (if (> .page 1)(setf pmatch (cons (- .page 1) 0))
          (setf pmatch (cons .page 0))))         

        (let* ((matchtop (elt (car .edges) 1))
              (matchbott (elt (car .edges) 3))
              (pdistance (max 0.01 (- matchtop (cdr pmatch) (- (car pmatch) .page) )))
              (ndistance (max 0 (- (cdr nmatch) matchbott (- .page (car nmatch)) )))
              (threshold (* 2 (- matchbott matchtop)))
              (topextend (- matchtop (min range pdistance) (max -1 (- (car pmatch) .page)) ))
              (bottomextend)
              (temptext)
              (text)) 

;; pdistance = matchtop - pmatch + (page.matchtop - page.pmatch) on  [0;+infinite[  
;; ndistance = nmatch - matchbott + (nmatch.page - match.page) on [0;+infinite[            
;; manipulate threshold in case of a bug, when pdf-occur-range is large, which leads to 
;; the selection start at n+1 line of the match, make it disappear from the text displayed

          (setf (cdr pmatch) (+ matchbott (min range ndistance) (max (- .page (car nmatch)) -1) ))

;; we set cdr pmatch here, but this value will be used at the next loop, the next matches, by let*
;; the car value will be set below, as it depends of some conditions
;; pmatch set the distance of the bottom selection, which will be used at the next loop to set pdistance. 
;; BUT, the real bottom selection will be BOTTOMEXTEND which equal pmatch minus a threshold ; the threshold by 
;; default is the high of a "common" line. This is useful to avoid the overlap of the current bottom selection 
;; range and the top selection of the next match. 
;; When nmatch.page = .page, pmatch = P = matchbott + min range/ndistance
;; When nmatch.page > .page, pmatch = P - 1. We got this 1 by (max -1 (.page - nmatch.page))

          (if (and (< ndistance (- matchbott matchtop)) )   ;; nmatch is on line n+1
            (setf bottomextend matchbott)
          (setf bottomextend (max 0 (- (cdr pmatch) threshold))))

;; why a threshold of matchbott-matchtop ? because a nmatch situated on the line n+2 will always be 
;;at a distance of > matchbott-matchtop, 
;; which is assumed be the high of a common line of text
;; normally, we put a threshold 2x high of a line to be sure the selection ranges don't overlap 
;; the bottom area stop one line before the beginning of the next selection (which is set by pmatch). 
;; This rule is cancel if the next match is at line n+2, as we want select the line n+1 in this case... 
;;but at the cost to select probably the line n+2 with...                   

          (setf temptext (pdf-info-gettext .page 
                                            (list 0 
                                            (max topmargin (- topextend (- .page (car pmatch))) ) 
                                            1 
                                            (min (- 1 botmargin) (+ bottomextend (- (car nmatch) .page)) )) 
                                            'line filename))

;; temptext is the gettext selected text on the page of the match, will be or not concatenate with the extend on previous/next page                 
;; temptext can't use directly topextend and bottomextend, because there are designed to be correct 
;; both on the page and the previous/next one by adding 1 or -1 and get appropriates coordinates for the extended selections 
;; on previous/next page. We correct this ajustment by adding -1 or 1 to inverse it.

;; Now we determine if we need some extend on previous/next page and we get them

          (let ((top (- matchbott (min range pdistance)))
                  (bott (+ matchbott (min range ndistance))))
            (cond ((and (<= 0 top) (>= 1 bott))
                      (setf (car pmatch) .page)
                      (setf text temptext)
                      )
                      ((and (> 0 top)(< 1 bott))
                        (setf (car pmatch) (+ .page 1))
                        (setf text (cl-concatenate 'string 
                              (pdf-info-gettext (- .page 1) 
                                        (list 0 (max topmargin topextend) 
                                        1 (- 1 botmargin)) 
                                        'line filename) 
                              temptext 
                              (pdf-info-gettext (+ .page 1) 
                                        (list 0 topmargin 
                                        1 (min (- 1 botmargin) bottomextend)) 
                                        'line filename)))
                      )
                      ((< 1 bott)
                        (setf (car pmatch) (+ .page 1))
                        (setf text (cl-concatenate 'string 
                                      temptext 
                                      (pdf-info-gettext (+ .page 1) 
                                                (list 0 topmargin 
                                                1 (min (- 1 botmargin) bottomextend)) 
                                      'line filename)))
                      )
                      (t    
                        (setf (car pmatch) .page)
                        (setf text (cl-concatenate 'string 
                                        (pdf-info-gettext (- .page 1) 
                                                  (list 0 (max topmargin topextend) 
                                                  1 (- 1 botmargin)) 
                                        'line filename) 
                                        temptext)))))

;; ## PUSHING
          (setf text (replace-regexp-in-string "\n" " " text))

;;#Optionnal sidematches highlight

          (when sidematches
          (let (occurrences)
            (dolist (word sidematches)
              (let ((begin (list 0))) 
                (cl-loop
                  (let ((match (cl-search word text 
                                                  :test 'char-equal 
                                                  :start2 (car begin))))
                        (if (eq match nil)(cl-return) 
                          (push (cons match 
                                          (+ match (length word)) )
                                                  occurrences)
                          (push (cdr (car occurrences)) 
                                                  begin))))))
          (when occurrences
            (dolist (el occurrences)
              (add-text-properties (car el) (cdr el) '(comment t face highlight) text)))))

;;# Set matches highlightning
;; cl-search search the text for the "matchword" which is the word searched and that is import from pdf-occur-start-search. Cl-search retrieve only the first entry, I don't know if it can be different. So we loop the search to find all the entries, the loop stop when there is none and the search returns nil. At each time, the search has to start not from the beginning, but from the nth +1 of the previous match to hope find another one. The results are pushed in a list, occurrences, constitute by all the positions (nth of the string text) of the matches. After that, a dolist hightlight all the portions of text included from each position from occurrences to the length of the matchword.

          (let ((occurrences)(begin (list 0)))
            (cl-loop
              (let ((match (cl-search matchword text 
                                              :test 'char-equal 
                                              :start2 (car begin))))
                (if (eq match nil)(cl-return)
                  (push (cons match
                                (+ match (length matchword)) )
                                      occurrences)
                  (push (cdr (car occurrences)) 
                                      begin))))                             
          (setf occurrences (nreverse occurrences))
          (dolist (el occurrences)
            (add-text-properties (car el) (cdr el) '(comment t face match) text)))

          (push (pdf-occur-create-entry filename .page (cons text .edges)) entries) )))))           
          (setq entries (nreverse entries))
          (pdf-occur-insert-entries entries)))                                                  
)

Twix53791 commented 2 years ago

This code needs to small lines set (defun pdf-occur-start-search ... just above the line which calls pdf-occur-add-matches :

           ...
           (setq maxpages (cdr pages))
           (setq matchword string)
           (pdf-occur-add-matches document response)
           ...

The matchword variable allows to export the string of the word searched in pdf-occur-add-matches, to allow this string in the cl-search which gonna search the selected text for the matches and highlight them. The maxpages is an easy way (maybe there is more easy?) to get the max pages of the document, to forbid the extension on a nonexistent page at the end of the document (as the bottom extend search for the next page, there is no page next the last one...

Twix53791 commented 2 years ago

Some explanations...

...for the poor newbies who take, like me, three days to understand how the let function works !

The principle of this code is to exclude of the selection the part of the text already display in the previous selection. It gonna display only one portion of text for each match, which exclude other matches on it, excepting those situated on the same line, which leads to duplicates already in the native pdf-occur-search. It could be remove probably, but it is not so necessary and it has also its advantages.

Everything relies on the setting of the two variables setting the distance between the current match and the next one : The principle is recursive of course : the first match don't have a previous one, so it just gonna check the distance to the next one, to be sure it will not select it in the area of selection it defines, if its too close. The extension of the selection at the bottom, basically, will be the minimum between two values : ndistance, range (pdf-occur-range). The second match is going to check the distance of the next one, as always, but also now the previous one : not the previous match position itself, but the point at which its area of selection has stop, to avoid include in its own some text already displayed. This position, very logically, don't have to be calculate by the current match, but better to be set by the previous one when it has defined its selection. That's what signify the variable pmatch. Its a cons with two values : page.position of the bottom edge of the selection.

When the current match has these two information, the distances to the next and previous ones, it set its own area of selection, in a way depending of some simple parameters : if the range (pdf-occur-range) goes over on the next or previous page, the search gonna select an area of text on the next or previous page, of course if there is no other matches close which forbid this move.

By the way, I add here two variables, topmargin and botmargin which excludes the far top and far bottom of the page, the margins, to avoid selecting for example the page numbers or some recurrent headers or footers common in the documents. This have maybe to be calibrated again on some different documents having distinct margins. I did a small function to help at it :

(defun pdf-occur-set-margins ()
    (interactive)
    (let ((page (read))(step (read))(filename buffer-file-name)(topedge 0)(bottomedge 0))
        (message "PAGE %s" page)
        (message "Syntax (scanned portion of the page . text fins in this portion)")
        (cl-loop
            (setq bottomedge (min 1 (+ step topedge)))
            (setq textfind (pdf-info-gettext page (list 0 topedge 1 bottomedge) 'word filename))
            (message "EDGES REGION  [%s ; %s] : %s" topedge bottomedge textfind)
            (setq topedge (+ step topedge))
        (when (= bottomedge 1) (cl-return))
        )
    )
    (setq topmargin (min 0.5 (max 0 (read))))
    (setq botmargin (min 0.5 (max 0 (read))))
    (message "topmargin set to %s" topmargin)
    (message "botmargin set to %s" botmargin)
)

What about the performances ?

Its slower, of course, than the pdf-tools native search, which is amazingly fast ! Bravo to the developers by the way ! This speed reduction increase with the number of occurrences of the search, basically with the amount of text to retrieve from the document. When the numbers of occurrences is slow (< 100), the difference of speed between the pdf-occur native search and the extended one is unnoticeable. It becomes consequent with some hundreds of occurrences : for example, a search for 1056 occurrences on a document of 576 pages its 3 times slower with the extended search than the native occur. Its raise at x5, x6 times more for a search of 5500 occurrences on the same document. It becomes really slow and unusable after 10 000 occurrences for my own, taking for example one minute and half for 34 000 occurrences on the same documents. There is not a big difference between the speed at different extend ranges. My max is one page range, meaning than in fact two pages of text can be loaded. We can imagine how much text it is when you have a document of 550 pages and 5500 occurrences on it ! Its quite the all document.

Is this slow a problem ? I don't think so. In my perspective, this feature is not intended to search all the occurrence of a term in a document, the native occur search do it very well. It is designed to be a tools on the side of the native occur search, to precise the search after a first overview by the native search. It can be maybe useful, for the future, to add a shortcut to define a filter defining the range of pages searched, to focus the search and accelerate it.

Some bugs ?

Actually, the main problem comes from the way of searching the pdf, with this system of area defined by edges. It is a bit imprecise, and at large range of extension, when the areas selected by the extended search are contiguous, there is a bug at the beginning of the paragraphs, in fact when there is a gap in the text, some blank areas between a title and a text for example, or two paragraphs : the top selection tends to overlap on the next one. It can push it, sadly, and as I define here a prevalence of the bottom selection on the top, meaning than the selection of the previous match comes until the level of the line of the match, which is on the first line of its selection, it can shift it by one line, making the match disappear from the display. I circumvent this by adding a threshold on the bottom extend. But it is calibrate for some texts, maybe not for all the pdfs. So it is not completely sure...

Twix53791 commented 2 years ago

A small feature added : side matches & excluded matches

Pdf-occur allows us to refine a search by searching the buffer of a first one. We can also add filters, but not so handy to use. As I find a way to highlight words in the text of the extended search, I think "why not highlighting more words?!". So I add the possibility to make a search highlighting until ten words in the text searched, in green. The words are store in a variable, and are highlighted for the future search. To reset and void the variable, just do it a new pdf-occur-side-matches without typing anything in the input.

Also, Andreas Politz made a package, tablist, to put filters on the results of pdf-tools-occur search, but the problem is these filters just remove the results filtered of the display, not of the buffer as they are dynamics and applied after the search. So, if we refine the search searching the occur results, we still search into the results hidden by the filter. We can't, with this method, exclude some matches containing some specifics words. So I add the variable excludematch to be able to do that.

I bind the function on a key, this function ask for the words searched, separate by _ here, then split this input in a list containing each words, and then call the pdf-occur search normally, asking for the main word searched. Actually, the sides words are not "searched" on the document, by only in the text displayed by the occur-search in extended mode.

(defvar sidematches nil)

(defvar excludematch nil)

(defun pdf-occur-side-matches () 
    (interactive)
    (setf sidematches (read-string (format "Enter until 10 sidematches :")))
    (if (equal sidematches "")(setf sidematches nil)
        (progn
            (setf sidematches (split-string sidematches "_"))
            (when (> (length sidematches) 10)
                (setf sidematches (cl-subseq sidematches 0 10)))))

    (setf excludematch (read-string (format "Enter until 5 words to exclude :")))
    (if (equal excludematch "")(setf excludematch nil)
        (progn
            (setf excludematch (split-string excludematch "_"))
            (when (> (length excludematch) 5)
                (setf excludematch (cl-subseq excludematch 0 5)))))

    (call-interactively 'pdf-occur)
)

After, put the code bellow in the pdf-occur-add-matches, just before the section ;;#Set matches highlightning, at the Pushing moment (cf. the previous messages)

...
  (dolist (match matches)
    (catch 'excluded
      (let-alist match
...

;;#Optionnal excluded terms

  (when excludematch
    (dolist (word excludematch)
    (when (cl-search word text :test 'char-equal)
      (cl-decf pdf-occur-number-of-matches)
      (throw 'excluded nil))))                       ;; take back to catch removing this entry of the buffer 
                                                                ;; and pdf-occur-number-of-matches

;;#Optionnal sidematches highlight

  (when sidematches
    (dolist (word sidematches)
      (let ((begin (list 0))) 
        (cl-loop
          (let ((match (cl-search word text 
                  :test 'char-equal 
                  :start2 (car begin))))
        (if (eq match nil)(cl-return) 
            (add-text-properties match (+ match (length word)) '(comment t face highlight) text)
            (push (+ match (length word)) 
                    begin)))))))

tongjie-chen commented 2 years ago

Hi @Twix53791. That's really hard work on all of this. I did not set a good example on the code style when I replied.

Long lines obscures readability. It's best to cut with line breaks or with simplified variables.
setq is not good because it makes the variable global. The best practice is to use let to limit the scope to only this function.

By the way, if you want to get help quickly with demo code, elisp-demos https://github.com/xuchunyang/elisp-demos has some really good examples.

tongjie-chen commented 2 years ago

If condition just has two cases, you can use (if pred true-clause false-clause), use (progn ...) to combine multiple expressions in true or false clauses.

You can get help of functions with C-h f.

Twix53791 commented 2 years ago

Hi @Twix53791. That's really hard work on all of this. I did not set a good example on the code style when I replied.
* Long lines obscures readability. It's best to cut with line breaks or with simplified variables.

* `setq` is not good because it makes the variable global. The best practice is to use `let` to limit the scope to only this function.
By the way, if you want to get help quickly with demo code, elisp-demos https://github.com/xuchunyang/elisp-demos has some really good examples.

Thanks ! Thank you for these tips, that's kind of things are not teach usually in the wikis, too obvious for those who know! Nice tool this elisp-demos !

Twix53791 commented 2 years ago

So, if I can take a bit more of your time @tongjie-chen, as no wiki/book is more efficient than the living explanation give by a human in a conversation, about the setq and let, you mean than is preferable to define the variables in let and to add eventually some let inside let instead of setq ? Like that (I review the code of the "side matches") :

(when sidematches            ;;sidematches is a list of strings, each one is a word to search in text
  (let (occurrences)
    (dolist (word sidematches)
        (let ((begin (list 0)))      
;;will be the beginning of cl-search, incremented on each occurrence of word, 
;;but reset for each new word as the search have to initiate again from the beginning of text for each new word
        (cl-loop
            (let ((match (cl-search word text 
                        :test 'char-equal 
                        :start2 (car begin))))
            (if (eq match nil)(cl-return)
                    (push (cons match 
                        (+ match (length word)) )
                        occurrences)
                    (push (cdr (car occurrences)) begin))))))
    (when occurrences
      (dolist (el occurrences)
        (add-text-properties (car el) (cdr el) '(comment t face highlight) text)))))

So here, to not use (setq begin (cdr (car occurrences))) on begin, I put rather (push (cdr (car occurrences)) begin), and make begin a list. Is there really no way to just set a variable define by let inside the let? I means, if its possible for list variables using push, it should have a way to set a new value for the other kind of data is'nt it?

Twix53791 commented 2 years ago

Ah, and maybe another quite "general" question : is that better to multiply variables to store one time calculations/grasp of values or to include several times the same calculus/seeking for a value into the code? For example, I used in the main code it did the variables matchtop and matchbott to store (elt (car .edges) 1) and (elt (car .edges) 3) : my idea was to make the code more readable and also, I was thinking than the action of "asking" the values, calling .edges, will be execute one time and not each time of use of it. Am I correct and is it a good practice? Thank you. PS: Or I create the variable pdistance for "(max 0.01 (- matchtop (cdr pmatch) (- (car pmatch) .page)" to not repeating this calculus several times.

tongjie-chen commented 2 years ago

Ah, and maybe another quite "general" question : is that better to multiply variables to store one time calculations/grasp of values or to include several times the same calculus/seeking for a value into the code? For example, I used in the main code it did the variables matchtop and matchbott to store (elt (car .edges) 1) and (elt (car .edges) 3) : my idea was to make the code more readable and also, I was thinking than the action of "asking" the values, calling .edges, will be execute one time and not each time of use of it. Am I correct and is it a good practice? Thank you. PS: Or I create the variable pdistance for "(max 0.01 (- matchtop (cdr pmatch) (- (car pmatch) .page)" to not repeating this calculus several times.

I am not expert in Elisp. ~~But I think all the numbers are created when they are calculated and occupy memory space~~ (Or maybe I was wrong, the calculation is done in CPU cache). It does not matter much if you put a name to it (if you don't make many of them). The garbage collector will take care of garbage numbers in the middle steps. For readability, naming those complex calculation is a good practice.

tongjie-chen commented 2 years ago

I did not see why you use (let ((occurrences)) to declare a nil variable and not put it inside. If the variables you want to declare have dependence, use let*, such as

(let* ((x 1)
      (y (* x 10)))
  (message "%s, %s" x y))

After you write down code, for better indention, you can select the code and then press C-M-\ or M-x indent-region.

For if clauses, you can enclose all true or false clauses inside a (progn) if that is multi line, such as, from the info [[info:elisp#Conditionals][info:elisp#Conditionals]] (put this link to org-mode to open).

 (if CONDITION (progn A B C) nil)

Twix53791 commented 2 years ago

After you write down code, for better indention, you can select the code and then press C-M-\ or M-x indent-region.

Yes, when I copy the code here the indentation is messy... I ll try you tip...

Twix53791 commented 2 years ago

For if clauses, you can enclose all true or false clauses inside a (progn) if that is multi line, such as, from the info

I understand that progn function. I did'nt still use it because I have only several evaluations in the false-clause and it works without. Do you think it will be better to put these false-clause into a progn? Why? Why don't let them just like that, one after one?

Twix53791 commented 2 years ago

I did not see why you use (let ((occurrences)) to declare a nil variable and not put it inside. If the variables you want to declare have dependence, use let*, such as

Maybe my syntax was wrong? I put now (let (occurrences) I don't see how to do differently : occurrences is a list which have to store all the "push" did by dolist, and as dolist execute itself several times, as much as there are words in sidematches (until 10), the variable occurrences can't be set inside dolist, it has to be outside and it is has to be set as we push something inside, isn't it? I could use setq but you told me is better to use let.

tongjie-chen commented 2 years ago

For if clauses, you can enclose all true or false clauses inside a (progn) if that is multi line, such as, from the info

I understand that progn function. I did'nt still use it because I have only several evaluations in the false-clause and it works without. Do you think it will be better to put these false-clause into a progn? Why? Why don't let them just like that, one after one?

I guess with progn it's best for readability. You can refer more to other people's code and see.

tongjie-chen commented 2 years ago

I did not see why you use (let ((occurrences)) to declare a nil variable and not put it inside. If the variables you want to declare have dependence, use let*, such as

Maybe my syntax was wrong? I put now (let (occurrences) I don't see how to do differently : occurrences is a list which have to store all the "push" did by dolist, and as dolist execute itself several times, as much as there are words in sidematches (until 10), the variable occurrences can't be set inside dolist, it has to be outside and it is has to be set as we push something inside, isn't it? I could use setq but you told me is better to use let.

If you want to make a blank list or vector and then fill the value, you probably have not mastered functional programming. In this case, it's best to use map than dotimes. https://en.wikipedia.org/wiki/Map_(higher-order_function)

Twix53791 commented 2 years ago

If you want to make a blank list or vector and then fill the value, you probably have not mastered functional programming. In this case, it's best to use map than dotimes. https://en.wikipedia.org/wiki/Map_(higher-order_function)

No, I don't know anything about functional programming ! I understand, I fact I could use mapcar for example to get my list as it returns the results of the operations it does on every element of the list. Life of a newbie in programming is like, every 5 minutes, "aaah! But I can do that?! So much better!". Anyway, I just realize another of my obvioux mistakes : in fact I just don't need either my empty list or a map function, as in fact I can just execute the add-text-properties inside the loop which find every occurrence of the word in the text...