greghendershott / racket-mode

Emacs major and minor modes for Racket: edit, REPL, check-syntax, debug, profile, packages, and more.

https://www.racket-mode.com/

GNU General Public License v3.0

680 stars 93 forks source link

Using racket-hash-lang-mode with org-mode source blocks #692

Open bremner opened 8 months ago

bremner commented 8 months ago

I'd like to be able to set a variable to tell racket-hash-lang-mode what the buffer syntax is. My use case is editing source blocks in org-mode where the #lang is implicit. Source might look like the following

#+begin_src smol :shebang "#lang smol/fun" :tangle lecture1/smol1.rkt
  (defvar x 10)
  (deffun (f y) (+ x y))
  (f 3)
#+end_src

I can configure this to translate smol to racket-hash-lang-mode when running org-edit-special (and it works without a file, thanks for that). I'm just experimenting with racket-hash-lang-mode, so I have a hard time seeing what is lost by not actually knowing the #lang, but I guess it must be something, right?

greghendershott commented 8 months ago

I'm just experimenting with racket-hash-lang-mode, so I have a hard time seeing what is lost by not actually knowing the #lang, but I guess it must be something, right?

In racket-hash-lang-mode the #lang line says where to find live Racket code supplied by the language -- which does quite a few things:

classify strings, comments, other (largely replacing the Emacs "char syntax" and parse-partial-sexp machinery)
color the above, i.e. font-lock
navigate, i.e. supplying a forward-sexp-function that uses the lang concept of "grouping"
indent, i.e. supplying an indent-line-function and indent-region-function

I don't use org babel stuff enough to know off the top of my head exactly what you're expecting, and how that would work. I'd be happy to take a look.

Already racket-repl-mode needs the idea of the #lang not being found in the buffer text. So there's some precedent and support for something like that.

greghendershott commented 7 months ago

I don't use org babel stuff enough to know off the top of my head exactly what you're expecting, and how that would work. I'd be happy to take a look.

I looked at this again, trying to learn more about org source block handling (with which I have almost no hands-on experience). I read the org-source docs, and skimmed some of the source code.

Maybe I'm wrong, but:

Something like :shebang "#lang smol/fun" in your example seems like the right idea -- but unfortunately :shebang only applies when tangling.
There's also :prologue -- but that only applies when executing.
AFAICT there's no property that causes something to be prepended when editing (and removed when done editing).

So status quo I think your only choice is to include the #lang line as the first line of each org src block contents? :disappointed:

Maybe we could stipulate some new header argument property like :racket-hash-lang and somehow connect that to racket-hash-lang-mode. But I'm not sure if the properties are intended to be extensible by third parties.

greghendershott commented 7 months ago

Maybe we could stipulate some new header argument property like :racket-hash-lang and somehow connect that to racket-hash-lang-mode. But I'm not sure if the properties are intended to be extensible by third parties.

Actually from looking at some examples like ob-c and ob-clojure, it seems the the header argument properties are open. OK to add more, ad hoc, per lang, it seems?

Also org-edit-src-code funcalls a lang-specific "edit prep function" with the props in a babel-info data structure:

    (let ((edit-prep-func (intern (concat "org-babel-edit-prep:" lang))))
      (when (fboundp edit-prep-func)
        (funcall edit-prep-func babel-info)))

So conceivably I could define a org-babel-edit-prep:racket-hash-lang to look for a new property, called say ":hash-lang". Or maybe just reuse :shebang here.

And either way, do something with the value:

Maybe just prepend a #lang line in the new edit buffer (but how to delete it later saving back, idk).
Or maybe add a text property (invisible), and use roughly the same approach that racket-repl-mode uses to use a lang lacking any #lang xxx text in the buffer.

greghendershott commented 7 months ago

I pushed to a topic branch a simple commit ddd0f46. It just adds to racket-hash-lang.el:

(defun org-babel-edit-prep:racket-hash-lang (babel-info)
  "Recreate back-end hash-lang object using :shebang property.

`org-edit-src-code' calls us AFTER the `racket-hash-lang-mode'
buffer is created. So if there is a :shebang property with
\"#lang foo\", we need to recreate the back end object using the
option where we can supply this."
  (pcase babel-info
    (`(,_racket-hash-lang ,_contents ,props . ,_)
     (when-let (shebang (cdr (assq :shebang props)))
       ;; re-create back end hash-lang object
       (racket--hash-lang-delete)
       (setq-local racket--hash-lang-id
                   (racket--cmd/await
                    nil
                    `(hash-lang
                      create
                      ,(cl-incf racket--hash-lang-next-id)
                      ,shebang
                      ,(buffer-substring-no-properties (point-min) (point-max)))))))))

Given an example file issue-692.org:

#+begin_src racket-hash-lang :shebang "#lang rhombus"
  "string"
  ~string
  1 + 1
#+end_src

I did C-c ' and "it works".

I'm not sure it's quite that simple...?

greghendershott commented 7 months ago

p.s. This also seems to support "DRY" patterns to avoid repeating a property like :shebang on every block.

Like a property for the whole org file:

#+PROPERTY: header-args:racket-hash-lang :shebang "#lang rhombus"

#+begin_src racket-hash-lang
  ~string
  1 + 1
#+end_src

As well as a property drawer for just a section within the org file.

As well as setting the Emacs Lisp variable org-babel-default-header-args:racket-hash-lang for system-wide, multiple org files.

(Again I have almost no experience with this stuff. I just read the docs and tried a quick example to confirm.)

bremner commented 7 months ago

Greg Hendershott @.***> writes:

I pushed to a topic branch a simple commit ddd0f46. It just adds to racket-hash-lang.el:
(defun org-babel-edit-prep:racket-hash-lang (babel-info)
  "Recreate back-end hash-lang object using :shebang property.

I tried copying this function to my init.el and as you say it seems to work for editing. At least indentation seems to work. For syntax highlighting as you mention in the documentation there is strings, numbers and symbols, so it is pretty muted (without additional configuration).

Tangling does not work yet, but I suspect this is unrelated to the original question; I had the same error when enabling racket-hash-lang mode globally.

When I try to tangle (C-c C-v C-t) with

#+begin_src racket-hash-lang :shebang "#lang rhombus" :tangle foo.rkt
  "string"
  ~string
  1 + 1
#+end_src

I get

mapc: Buffer is read-only: #<killed buffer>

I could not (easily) get a backtrace, sorry. Do you want a seperate issue for this, or just leave all the org-src+hash-lang mode discussion here?

greghendershott commented 7 months ago

I think this is because the racket-hash-lang-mode sets the buffer read-only, until the back end hash-lang object becomes ready. Some comments from its code:

  ;; Create back end hash-lang object.
  ;;
  ;; On the one hand, `racket--cmd/await' would be simpler to use
  ;; here. On the other hand, when the back end isn't running, there's
  ;; a delay for that to start, during which the buffer isn't
  ;; displayed and Emacs seems frozen. On the third hand, if we use
  ;; `racket--cmd/async' naively the buffer could try to interact with
  ;; a back end object that doesn't yet exist, and error.
  ;;
  ;; Warm bowl of porridge: Make buffer read-only and use async
  ;; command to create hash-lang object. Only when the response
  ;; arrives, i.e. the back end object is ready, enable read/write and
  ;; set various hook functions that depend on `racket--hash-lang-id'.
  ;;
  ;; Also, handle the back end returning nil for the create -- meaning
  ;; there's no sufficiently new syntax-color-lib -- by downgrading to
  ;; plain `prog-mode'.

This works fine when the user interactively starts racket-hash-lang-mode, including via the org edit source command. If they are too quick, they just get a "sorry not ready" message.

However org-babel-tangle creates a buffer, calls racket-hash-lang-mode, then immediately tries to use the buffer. Quite reasonably; normally an Emacs mode is ready to use when the mode init returns. (I think org-babel-tangle would be fine if racket-hash-lang-mode blocked and didn't return until ready. But I'm not sure how to detect being called by that and blocking only in that case -- or if that's even the best strategy.)

I'm not sure how to balance all the competing needs here but I'll give it a think...

p.s. When tracing through the code, I noticed that :shebang has the side-effect of giving created files executable mode -- not just automatically adding that text as the first line. So will probably want to revisit that, too.

greghendershott commented 7 months ago

Do you want a seperate issue for this, or just leave all the org-src+hash-lang mode discussion here?

I think for now one issue makes sense. Seems like some overlap among the three things -- edit, tangle, execute. At least wrt decisions like using :shebang and/or some new property, etc.

greghendershott commented 7 months ago

Update: I understand more about how things work. I think I see how to make both org-edit-src-block and org-babel-tangle work.

However I don't see how to make formatting work for the source block itself in the original .org buffer. What org does in that case is, create a hidden buffer using the lang mode -- e.g. named " *org-src-fontification:racket-hash-lang-mode*", copy the contents to that buffer, ensure font lock, and copy the faces back to the org buffer. That hidden buffer has no access to the org source block information, including the header argument properties like the lang. As a result, racket-hash-lang-mode can't know the actual lang. If there's no explicit #lang x in the source, it can't format for the hash-lang. :disappointed:

I've looked for kludgy ways for that special buffer to discover the corresponding source block and buffer, and the metadata like the lang. So far I'm stumped. Even if I had some kludge, it might be fragile.

To go with the grain of Emacs and org-mode expectations, each lang gets its own major mode, which knows about formatting the lang. Whereas the concept of racket-hash-lang-mode acting for various langs, doesn't really fit. This is an example.

A better fit would be for each hash-lang to have its own Emacs major mode. This could be a small major mode using define-derived-mode around racket-hash-lang-mode. It could set some Emacs var to hold the lang, for the base racket-hash-lang-mode to use.

Maybe I could make a little Emacs macro to do the define-derived-mode, as well as the couple of org-babel settings.

greghendershott commented 7 months ago

p.s. In my previous comment I'm referring to the formatting that happens when org-src-fontify-natively is non nil. I'm not sure if you have that enabled, or if you do, whether you really care about the formatting working -- but it bugs me that it doesn't format appropriately, like it does for other modes.

greghendershott commented 7 months ago

OK this took awhile but I think I understand the problem space now.

I have a solution that I believe is generally correct. (Caveat: Although I've tried to think of all the scenarios and edge cases, I've probably overlooked some.)

With commit 4491cc0 you get a racket-define-hash-lang macro.

In your original example, you can (racket-define-hash-lang smol ".smol") and all of {format, edit, tangle, execute} should "just work" when you use smol as the language for org source blocks.

Copy of the commit message:

racket-hash-lang: org source block {format edit tangle execute}

Closes issue #692.

As far as I can tell, org source blocks and org-babel are designed
around the assumption that each language will have its own major mode.
Otherwise, the source block language isn't available in all scenarios.

Therefore go with the flow: Even though racket-hash-lang-mode can
handle all hash-langs, people will need to derive from it a new major
mode for each lang they want to use with org source blocks.

A new racket-define-hash-lang macro makes this easier, as well as
handling related configuration like auto-mode-alist,
org-src-lang-modes, and org-babel-tangle-lang-exts.

With this we (intend to) fully support org source block
formatting, editing, and tangling.

When it comes to executing, we supply a basic org-babel-execute:<lang>
function that knows how to run all hash-langs. However it only
supports the :result-type output -- not values. And it does not
support input :vars. In both cases, the syntax and semantics will of
course vary among languages. However a user could define a
org-babel-expand-body:<lang> to support :vars for a given lang. (But I
don't yet have any how :result-type value would work.)

One issue that comes up for all four scenarios is what to do about
lang lines -- a Racket program must start with exactly one.

1. format: We use the back end hash-lang option to set the lang
separately (as we also use for the REPL).

2. edit: The user need not include one. We add one automatically when
they C-c ' to edit in the dedicated edit buffer), to keep things like
racket-xp-mode happy. And we subtract it when writing back to the org
buffer.

3. execute: We add one if the block lacks one.

4. tangle: It's up to the user to start the /first/ block (for each
lang) with one, but not the remainder.

What do you think about the idea?

If you're able to try the commit from that branch, how does it work for you?

bremner commented 6 months ago

Greg Hendershott @.***> writes:

OK this took awhile but I think I understand the problem space now.

I have a solution that I believe is generally correct. (Caveat: Although I've tried to think of all the scenarios and edge cases, I've probably overlooked some.)

With commit 4491cc0 you get a racket-define-hash-lang macro.

Apologies for the slow reply. I think the macro is actually called racket-declare-hash-lang-for-org-babel ?

In your original example, you can (racket-define-hash-lang smol ".smol") and all of {format, edit, tangle, execute} should "just work" when you use smol as the language for org source blocks.

I tried this with

(when (require 'racket-hash-lang nil t)
  (racket-declare-hash-lang-for-org-babel datalog ".dlog"))

and

#+begin_src datalog :tangle lecture15/borrows1.dlog
  borrows(java, cpp).
  borrows(cpp, c).
  borrows(c, bcpl).
  borrows(pascal, algol).

  descends(A, B) :- borrows(A, B).
  descends(A, B) :- borrows(A, Z), descends(Z, B).
#+end_src

Editing seems OK, but

somehow the tangled file does not have a "#lang" line. Is that expected?
Would it make sense to update auto-mode-alist to open files with the given suffix in the newly created mode? Maybe that is mission creep, I'm not sure.

greghendershott commented 6 months ago

I think the macro is actually called racket-declare-hash-lang-for-org-babel ?

Yes, sorry about the name change. Originally I thought this might be relevant beyond org-babel, but then realized not, and decided to make the name more specific to reflect that.

Would it make sense to update auto-mode-alist to open files with the given suffix in the newly created mode? Maybe that is mission creep, I'm not sure.

Similarly, originally I had the macro do exactly that, but took it out.

For a few real-world langs -- like racket, scribble, and rhombus -- there are langs people might prefer to use with a dedicated "classic" mode (like racket-mode, scribble-mode, or rhombus-mode) as opposed to racket-hash-lang-mode. So I think it is mission creep.

somehow the tangled file does not have a "#lang" line. Is that expected?

Yes. I thought I had it in the doc string, but, you need to do the #lang line explicitly in the first block. org-tangle is just concatenating all those.

bremner commented 6 months ago

Greg Hendershott @.***> writes:

Would it make sense to update auto-mode-alist to open files with the given suffix in the newly created mode? Maybe that is mission creep, I'm not sure.

Similarly, originally I had the macro do exactly that, but took it out.

For a few real-world langs -- like racket, scribble, and rhombus -- there are langs people might prefer to use with a dedicated "classic" mode (like racket-mode, scribble-mode, or rhombus-mode) as opposed to racket-hash-lang-mode. So I think it is mission creep.

That makes sense, but I then I wonder (lazily without looking at the code) what the file extension is needed for?

somehow the tangled file does not have a "#lang" line. Is that expected?

Yes. I thought I had it in the doc string, but, you need to do the #lang line explicitly in the first block. org-tangle is just concatenating all those.

It also works to add :shebang to any block. As you observed in some previous email that has side effects with respect to permissions, but I just override that globally with something like

+PROPERTY: header-args :tangle-mode (identity #o644)

I have not noticed any problems with abusing :shebang in this way. I use it quite extensively for tangling racket files (with classic racket-mode).

greghendershott commented 6 months ago

That makes sense, but I then I wonder (lazily without looking at the code) what the file extension is needed for?

It's used to add the lang to org-babel-tangle-lang-exts.

IIUC so in the scenario where you tangle foo.org but don't supply a filename it will just do foo.<ext>?

It also works to add :shebang to any block. As you observed in some previous email that has side effects with respect to permissions, but I just override that globally with something like

+PROPERTY: header-args :tangle-mode (identity #o644)

I have not noticed any problems with abusing :shebang in this way. I use it quite extensively for tangling racket files (with classic racket-mode).

Although it's been a few weeks now, I recall from code spelunking that :shebang was something that org-babel-tangle knew how to handle specially. IOW there wasn't any obvious way for me to do a similar "first block only" behavior automatically from the src block language property, or any other property.

Given that, it's going to be up to the user to have the tangled output start with a #lang line... somehow -- either via the :shebang property, or, by including the #lang line literally in the first block. (I feel like the latter is simpler for me to document, which is what I did -- but I didn't mean to imply the former can't work or that you shouldn't prefer it.)

bremner commented 6 months ago

Greg Hendershott @.***> writes:

That makes sense, but I then I wonder (lazily without looking at the code) what the file extension is needed for?

It's used to add the lang to org-babel-tangle-lang-exts.

IIUC so in the scenario where you tangle foo.org but don't supply a filename it will just do foo.<ext>?

I didn't know about it, but apparently this is a thing some people like to do.

Given that, it's going to be up to the user to have the tangled output start with a #lang line... somehow -- either via the :shebang property, or, by including the #lang line literally in the first block. (I feel like the latter is simpler for me to document, which is what I did -- but I didn't mean to imply the former can't work or that you shouldn't prefer it.)

Maybe I will send a doc patch later. It's obviously not a blocker.

I do notice some strange behaviour with in a source block when not in an indirect buffer. I haven't got a good reproducer for that yet.

greghendershott commented 6 months ago

I pushed another commit with some doc prose edits, to the issue-692 branch.

The doc string now:

(defmacro racket-declare-hash-lang-for-org-babel (lang ext)
  "Arrange for a Racket hash-lang to work with org-babel.

LANG should be an unquoted symbol, same as you would use in a
Racket =#lang= line.

EXT should be a string with the file extension for LANG, /not/
including any dot.

Examples:

  (racket-define-hash-lang rhombus \"rhm\")
  (racket-define-hash-lang scribble/manual \"scrbl\")

This macro will:

0. Define a major mode derived from `racket-hash-lang-mode' named
   `racket-hash-lang:LANG-mode'.

1. Add the language to `org-src-lang-modes' and
   `org-babel-tangle-lang-exts'.

2. Define a org-babel-edit-prep:LANG function.

3. Define a org-babel-execute:LANG function, which delegates to
   `racket--hash-lang-org-babel-execute'. See its doc string for
   more information -- including why this macro /cannot/ also
   define a org-babel-expand-body:LANG function.

4. Allow a buffer to omit the explicit #lang line, when it is
   created by `org-mode' for user editing or formatting of a
   source code block whose language property is LANG.

Discussion:

A valid Racket program consists of one outermost module per
source file, using one lang. Typically this is expressed using a
=#lang= line -- which must occur exactly once at the start of the
file. In such a buffer, `racket-hash-lang-mode' \"just works\".

When using multiple `org-mode' source blocks of the same lang,
the situation is trickier:

- Although you could start /every/ block with a lang line, that's
  tedious, and org-tangle will concatenate them into an invalid
  program.

- On the other hand, if you start only the /first/ block with a
  lang line, then various org-babel features won't work properly
  with the subsequent blocks. Basically this is because org
  creates a hidden buffer using `racket-hash-lang-mode', but the
  source block's lang property value is not available to that
  buffer, so it can't know what lang line to add automatically.

- Similarly, if you use the :shebang property to tangle
  correctly, that property value is not available in the hidden
  buffers created by org mode.

TL;DR: Org assumes that each lang will have a major mode that
knows enough to do what is required. To accommodate this it is
simplest to define a distinct major mode for each org source
block language."

Unfortunately I think that prose is still not great about explaining that shebang is another good/sufficient way to make tangling work.

Most of the discussion (attempts to) explain that org-mode creates hidden buffers, and those buffers get no access to any of these src block properties (source lang, shebang, whatever). That's what pushes us to derive a major mode for each source lang.

(The whole situation is kind of confusing. I want to make sure I understand it, and also try to make users not need to understand it more than necessary. So the macro tries to help do that, but the doc string still needs to explain the situation just in case... ugh.)

bremner commented 6 months ago

Greg Hendershott @.***> writes:

I pushed another commit with some doc prose edits, to the issue-692 branch.

The doc string now:
(defmacro racket-declare-hash-lang-for-org-babel (lang ext)
  "Arrange for a Racket hash-lang to work with org-babel.

LANG should be an unquoted symbol, same as you would use in a
Racket =#lang= line.

Maybe it would be enough to mention that this "work with org-babel" is really about source block editing, and for tangling the user should use the same method as works for them with racket-mode classic?

At some point a FAQ entry (or similar) about tangling might be appropriate, but it doesn't seem to be racket-hash-lang-mode specific?