Unicode-only view of the buffer

Mesabloo commented 1 year ago

Sometimes, I want to copy-paste some Isabelle code for my friends (or basically anywhere which does not interpret the Isabelle symbols) and the code becomes very unreadable (because of those Isabelle-specific sequences). I am wondering whether it would be possible to provide a view of the current buffer where selection and copy/paste would yield Unicode characters instead of the Isabelle (unreadable) sequences.

m-fleury commented 1 year ago

Okay this provides how to replace symbols by the utf8 (the org specific part must be ignored):

https://emacs.stackexchange.com/questions/30410/export-pretty-symbols-in-org-source-blocks

So, here is how to do that:

Copy the selected region (or the entire buffer if none is selected) to a new buffer, see https://stackoverflow.com/questions/13281026/how-to-use-interactive-r-function-in-this-situation
Run the replacement (see the link above)

Mesabloo commented 1 year ago

Here's an extract of the original buffer:

Original Isabelle buffer

``` theory Refinement_Types imports Main begin text \ We present here a formalisation of refinement types based on ``Refinement Types'' by Ranjit Jhala, Niki Vazou (2020). We will follow the article and implement (most) of the \ sub-languages presented. \ section \The Refinement Logic\ text \ We start by defining the Refinement Logic (RL), which is the basis used for decidable predicates. \ locale refinement_logic begin text \ Predicate \\<^bold>\\ are drawn from the quantifier-free fragment of linear arithmetic and uninterpreted functions (also known as \QF-UFLIA\). Note that the constructor for the ternary operation \if \<^bold>\\<^sub> then \<^bold>\\<^sub>1 else \<^bold>\\<^sub>2\ can be replaced by \(\<^bold>\ \ x = \<^bold>\\<^sub>1) \ (\\<^bold>\ \ x = \<^bold>\\<^sub>2)\. We don't provide custom notations for any of the datatypes underneath, because they would merely only clash with HOL's, or be too cumbersome to write (portray writing \\\<^sub>C\ for the conjunction of constraints\) and have close to no benefit. \ datatype ('v, 'ops) predicate = Var 'v | True | False | Integer int | Operation \('v, 'ops) predicate\ 'ops \('v, 'ops) predicate\ | Conjunction \('v, 'ops) predicate\ \('v, 'ops) predicate\ | Disjunction \('v, 'ops) predicate\ \('v, 'ops) predicate\ | Negation \('v, 'ops) predicate\ | IfThenElse \('v, 'ops) predicate\ \('v, 'ops) predicate\ \('v, 'ops) predicate\ | Function 'v \('v, 'ops) predicate list\ context begin text \ Examples at the end of \

\2.1.1\. \ ```

After running the command below, I get this result:

“Unicodified” Isabelle buffer

``` theory Refinement_Types imports Main begin text ‹ We present here a formalisation of refinement types based on ``Refinement Types'' by Ranjit Jhala, Niki Vazou (2020). We will follow the article and implement (most) of the λ sub-languages presented. › section ‹The Refinement Logic› text ‹ We start by defining the Refinement Logic (RL), which is the basis used for decidable predicates. › locale refinement_logic begin text ‹ Predicate ‹❙𝔭› are drawn from the quantifier-free fragment of linear arithmetic and uninterpreted functions (also known as ‹QF-UFLIA›). Note that the constructor for the ternary operation ‹if ❙𝔭⇩ then ❙𝔭⇩1 else ❙𝔭⇩2› can be replaced by ‹(❙𝔭 ⇒ x = ❙𝔭⇩1) ∧ (¬❙𝔭 ⇒ x = ❙𝔭⇩2)›. We don't provide custom notations for any of the datatypes underneath, because they would merely only clash with HOL's, or be too cumbersome to write (portray writing ‹∧⇩C› for the conjunction of constraints…) and have close to no benefit. › datatype ('v, 'ops) predicate = Var 'v | True | False | Integer int | Operation ‹('v, 'ops) predicate› 'ops ‹('v, 'ops) predicate› | Conjunction ‹('v, 'ops) predicate› ‹('v, 'ops) predicate› | Disjunction ‹('v, 'ops) predicate› ‹('v, 'ops) predicate› | Negation ‹('v, 'ops) predicate› | IfThenElse ‹('v, 'ops) predicate› ‹('v, 'ops) predicate› ‹('v, 'ops) predicate› | Function 'v ‹('v, 'ops) predicate list› context begin text ‹ Examples at the end of §‹2.1.1›. › ```

As far as I can tell, this produces the desired result. I also made it so that subscript/superscripts (blocks) are shown using the same arrows as in JEdit, so ⇩, ⇧, ⇘, ⇙, ⇗, ⇖. For bold numbers, I also replaced them with some from the Unicode range “Mathematical Alphanumeric Symbols” (1D400—1D7FF). So e.g. \<zero> would get translated to 𝟎.

Here's the code for the function:

(defun isar-unicodify-region-or-buffer ()
  "Open a view of the active region (or the whole buffer if none) using true Unicode tokens."
  (interactive)
  (let* ((range (if (use-region-p) (list (region-beginning) (region-end)) (list nil nil)))
         (start (car range))
         (end (cadr range))
         ;; Register all the existing symbols to be replaced.
         (symbs (append isar-extended-symbols-tokens isar-symbols-tokens isar-modifier-symbols-tokens))
         (buf (generate-new-buffer (format "%s-unicode" (buffer-name)))))
    ;; Copy the selected region (or the whole buffer) into `buf'.
    (insert-into-buffer buf start end)
    ;; Switch to the (soon to be) beautiful buffer.
    (switch-to-buffer buf)
    ;; Now replace Isabelle sequences in `buf' by actual Unicode symbols.
    (setq case-fold-search nil)
    (mapc
     (lambda (x)
       (goto-char (point-max))
       (while (re-search-backward (regexp-quote (format isar-token-format (car x))) nil t)
         (replace-match (cadr x) nil t)))
     symbs)
    ;; And this should be it!
  ))

Note that I am not using prettify-symbols-alist, as I noticed that it was always set to nil (meaning that we actually don't use it?). The code itself can probably be simplified. Also, some faces still persist (I believe only the faces not from the LSP semantic highlighting) in the Unicode buffer, but it's not really a big deal.

Also, I created isar-modifier-symbols-tokens as

(defcustom isar-modifier-symbols-tokens
  '(("^sub" "⇩")
    ("^sup" "⇧")
    ("^bsub" "⇘")
    ("^esub" "⇙")
    ("^bsup" "⇗")
    ("^esup" "⇖")
    ("zero" "𝟎")
    ("one" "𝟏")
    ("two" "𝟐")
    ("three" "𝟑")
    ("four" "𝟒")
    ("five" "𝟓")
    ("six" "𝟔")
    ("seven" "𝟕")
    ("eight" "𝟖")
    ("nine" "𝟗"))
  "Some various symbols used for Unicode printing. Most should be either invisible or styled (e.g. bold)."
  :type 'unicode-tokens-token-symbol-map
  :group 'isabelle-tokens
  :set 'isar-set-and-restart-tokens)

I don't know if it would be better to just inline it in the function above or not, as it should only be used there.

Mesabloo commented 1 year ago

There is perhaps one thing that I don't like here: the effect of (setq case-fold-search nil) persists after the function ended. Now let's say that some user has this variable set to t by default (and likes it that way for some reason), then we broke their setup. However, I do not know of a way to restore the old value at the end of the scope without making a temporary buffer local variable to store the old result then restore it by hand (which seems a bit dirty).

EDIT: ignore my comment, the fix is simply to move (case-fold-search nil) in the let* block and remove (setq case-fold-search nil).

m-fleury / isabelle-emacs

Unicode-only view of the buffer #83