noctuid / things.el

Extensions to thingatpt.el
51 stars 0 forks source link

Collaboration with Objed Author #7

Open noctuid opened 5 years ago

noctuid commented 5 years ago

Discussion started here.

The bounds functions in things.el only return (thing-name . (beg . end)). Inner and outer adjustments are only two possibilities. There is also the concept of "inside" and "around" from targets.vim, and since there hypothetically could be more, things.el allows arbitrary bounds adjustment after obtaining the "whole" bounds. The input to the provided library functions is either a thing, a thing with modifiers, or a list of such things. One example function call: (things-bounds '(things-paren :adjustment 'inner)). Modifiers will also allow building on top of existing text objects to create new text objects. For example, something like '(things-comma :constraint things-paren :ignore things-paren) would correspond to a argument text object (text in between two commas or a comma and a paren; supports nested parens with arguments of their own).

Support for "composite" things was a big reason for creating things.el. I started a similar project in 2016 (see targets.el) with the goal of having nicer functionality like remote selection of text objects with avy. Building it on top of evil turned out to be a mistake (as it looks like you've correctly determined beforehand). Another big reason for the new things.el is that target's implementation of "composite things" (e.g. select inside either parens, curly braces, square brackets, etc.) was ugly and buggy because of the level at which it was implemented. Targets.vim also recently implemented support for "multi-things," and according to the developer it was a massive refactor that resulted in ~50% of the code changing.

Because I've had very little time to work on the project and have been worried about building a fundamentally flawed implementation again, things.el has been more of an exploration of how to best design the simplest specification for defining text objects that can support as much useful functionality as possible. I probably won't prioritize stabilizing/testing/fixing bugs in the things.el code until I'm more sure that the things.el spec makes sense.

Some example features:

Trying to handle nestable text objects in targets.el is another big reason I decided that a complete spec that handles all edge cases like this is necessary. Things.el is built on top of thingatpt.el. thingatpt.el only has forward-op for searching, so like you mention, if you go to the end of a paren thing, for example, you may have skipped over inner paren things. To still allow seeking between (and selection with avy of) a nestable thing at any level, things.el introduces new rules (e.g. you additionally implement things-seek-op that will seek to the next paren at any level).

Locally I've been working on a contract with detailed information on how to implement text objects (including nestable ones) (there is still a prior draft in the targets.el readme, but it is pretty out of date). The idea is if you implement X, Y, Z (e.g. similar to your :try-next and :try-prev), things.el will then provide all its functionality for your text object. If you look at various evil text object packages, they often don't implement the features listed above at all or they reinvent the wheel for how to do them because evil doesn't sanely handle them or require implementing text objects in a way where it can infer how to do these operations. I additionally want to provide dead-simple definers for groups of similar text objects (e.g. regexps, regexp pairs, separators, text with a specific face or faces, etc.); there are currently a few example definers (e.g. (things-define-pair 'things-paren "(" ")") to create a nestable paren thing).

What would be ideal to me is for things.el (or some similar package; doesn't have to be mine) to be the de facto standard for text object creation. Then there wouldn't be different text object packages with a different feature set and with lock-in to some editing style. Users of evil, objed, vanilla Emacs, or whatever would then be able to benefit from these text objects.

I'm still exploring potential difficulties with certain desirable text objects. Right now I'm considering text objects composed of discontiguous regions and "smarter" text objects, e.g. the example I gave before of being able to bound operations to the correct context. An open paren in a string or comment shouldn't be matched with a close paren in a different string or comment in a "smart paren" text object. Similarly, an open paren not in a string or comment should not be matched with a close paren in a string or comment at all. I have a PoC implementation, but it is horribly inefficient (hopefully this is solvable because the idea for the method itself is very simple).

Eventually maybe objed could use things.el as a library. Even if not, I'm sure we could learn something useful from discussion. There are likely features in objed or edge cases you've run into implementing text objects that haven't crossed my mind at all. I could evaluate whether things.el could potentially support objed's features and how much change would be required for that if you think we might be able to collaborate there. Alternatively, maybe there is something you could learn from my experiments with text object systems. For example, we could talk about any particularly annoying edge cases that require consideration.

Things' readme is really vague at this point. If you're interested, I can update things' readme with the provided helpers/functionality and a draft of the more detailed "specification" I've been working on (probably by this weekend or next week). This would make it a lot more clear how things.el works and what it actually does.

clemera commented 5 years ago

The bounds functions in things.el only return (thing-name . (beg . end)). Inner and outer adjustments are only two possibilities. There is also the concept of "inside" and "around" from targets.vim, and since there hypothetically could be more, things.el allows arbitrary bounds adjustment after obtaining the "whole" bounds. The input to the provided library functions is either a thing, a thing with modifiers, or a list of such things. One example function call: (things-bounds '(things-paren :adjustment 'inner)).

With objed I noticed that the code for getting the inner object is very tied to getting the whole object. So returning them in one go is often more efficient. I think ideally the object would return all data it can obtain and then the user could access this data using query functions.

Then there wouldn't be different text object packages with a different feature set and with lock-in to some editing style. Users of evil, objed, vanilla Emacs, or whatever would then be able to benefit from these text objects.

This would be great! Even packages like expand-region could make use of this. Do you know about thing-at-point+ and find-where? I haven't looked at them in detail but we might find some inspiration there, too.

Eventually maybe objed could use things.el as a library. Even if not, I'm sure we could learn something useful from discussion. There are likely features in objed or edge cases you've run into implementing text objects that haven't crossed my mind at all. I could evaluate whether things.el could potentially support objed's features and how much change would be required for that if you think we might be able to collaborate there.

I haven't thought a lot about objeds object implementation beforehand because at the time I was mainly interested to get a working implementation of the editing style I had in mind but to allow for later changes of the object format I use access functions like objed-beg and objed-end so changing to another format wouldn't (hopefully) be to much work. One potential problem though is that objed is on GNU ELPA and I'm not able to depend on libraries which are not on GNU ELPA. Even if this never happens I'm interested in discussion, too.

noctuid commented 5 years ago

With objed I noticed that the code for getting the inner object is very tied to getting the whole object. So returning them in one go is often more efficient. I think ideally the object would return all data it can obtain and then the user could access this data using query functions.

For text objects bounded by a regexp, it may be could be possible to just get the inner bounds using the match data (is this what you mean)? For the things I've implemented the time to compute adjusted bounds afterwards is insignificant. Since there are potentially types of adjustments I haven't thought of, I thought it would be best to do it afterwards, and just let the user implement whatever arbitrary adjustments they wanted. The thing name and the bounds are enough to still get any type of adjusted bounds afterwards.

Consider the following examples where ~ and | are the mark and point:

;; "a" thing
{ foo: ~{ bar: "baz" }| }

;; "around" thing
{ foo: ~{ bar: "baz" } |}

;; "inner" thing
{ foo: {~ bar: "baz" |} }

;; "inside" thing
{ foo: { ~bar: "baz"| }}

inner and around are like vim's behavior for text objects that have no surrounding delimiters or anything (like a word):

;; "inner" / "inside" word
~foo| bar
;; "a" / "around" word
~foo |bar

Evil also has the concept of a linewise selection, so that is another adjustment that things.el supports.

There are also other conceivable adjustments too:

;; select sentence excluding quotation marks
"~/This is an italicized sentence in quotes./|"

;; select sentence excluding markup and quotations marks
"/~This is an italicized sentence in quotes.|/"

Instead of trying to predict all possible adjustments, I thought it would be best to just let the text object implementor optionally add whatever adjustments they think are useful. I think inner is definitely the most useful one, and I will probably make it mandatory for it to be implemented.

I could potentially change the return format to something more extensible should it become useful (e.g. '(:thing thing-name :beg 1 :end 2 ...))).

This would be great! Even packages like expand-region could make use of this.

Yes, expand-region and also link-hint, for example, could be potentially be built on top of this. There could also be something like a simple layer for defining various mark- commands for something more vanilla Emacs.

Do you know about thing-at-point+ and find-where? I haven't looked at them in detail but we might find some inspiration there, too.

I've looked briefly at thing-at-point+ and think I have one or two potential things in my local todo.org about it. I'd not looked at find-where before. Like the description says, I think most cases can be handled without having to move one character at a time and check (all cases I've looked at can), but the idea may be useful. My main inspirations so far have been vim plugins and thingatpt.el itself.

I use access functions like objed-beg and objed-end so changing to another format wouldn't (hopefully) be to much work.

Right now I have various functions for accessing/deconstructing things and returned bounds. For example, there is things-evil-range to convert bounds returned by things.el to something evil can understand. If we decide it's a good idea, I could potentially provide other helpers to make it smoother to use things.el

One potential problem though is that objed is on GNU ELPA and I'm not able to depend on libraries which are not on GNU ELPA.

I have avoided ELPA in the past, but since this library will likely never have any dependencies outside of what is builtin and could hypothetically serve as a complete replacement for thingatpt.el, putting it in ELPA is something I would definitely consider.

Thanks for the response. :) Once I update the readme, I'll probably pick your brain about some things some more.

clemera commented 5 years ago

For text objects bounded by a regexp, it may be could be possible to just get the inner bounds using the match data (is this what you mean)?

I meant the way I define objects in objed computing the inner bounds often happens based on the object bounds, so getting the inner bounds separately would internally mean getting the whole object again and then compute the inner bounds from that.

It seems the way you want to define things is more general, like providing "(" and ")" and then implement recognizing comment/string cases internally. In objed the objects are responsible for that logic, which means they are more flexible but they are more work to implement and may contain repetitive code. Having both options would probably be best, because I think there are many objects which can not be handled generally.

Here is an example how you can define a python block object with objed:

(objed-define-object python block
  :try-next
  (python-nav-forward-block)
  :try-prev
  (python-nav-backward-block)
  :beg
  (python-nav-beginning-of-block)
  (objed--skip-ws t (line-beginning-position))
  (point)
  :ibeg
  (forward-line 1)
  (objed--skip-ws)
  (point)
  :iend
  (python-nav-end-of-block)
  (point)
  :end
  (forward-line 1)
  (point))

How would you do define a similar thing? The code for object positions is executed in the order they are provided so it basically moves point from beginning to the end of the object returning the positions on the go which is often convenient, because Emacs often provides move to beginning, move to end commands for various structures.

Thanks for the response. :) Once I update the readme, I'll probably pick your brain about some things some more.

Thank you, too! I'm glad if I can be of any help!

noctuid commented 5 years ago

I had some time to get some basic documentatin to a readable state. I don't expect you to read through it all, but if you were interested, any feedback on constraints and on how things are implemented would be nice. I don't have anymore questions for now, but if you're interested, I can let you know when there is something easily testable.

because I think there are many objects which can not be handled generally.

Could you give an example of this?

You could handle comment/string cases internally with things.el if you wanted to, but another downside would be that the behavior couldn't be customized afterwards.

How would you do define a similar thing? The code for object positions is executed in the order they are provided so it basically moves point from beginning to the end of the object returning the positions on the go which is often convenient, because Emacs often provides move to beginning, move to end commands for various structures.

It's done the same as how thingatpt does it. You'd implement forward-op to go to the next thing end with a positive count or previous thing beginning with a negative count. That is all that is necessary for basic functionality (bounds, seeking, motions, etc.). For builtin things, things.el can work with little to no configuration. This is explained in a lot more detail in the readme now. Implementing extra adjustments (like :ibeg vs :beg) is explained here. Various helpers for adjustments are provided (e.g. shrink-by-regexp, things-shrink-by-newlines, and things-shrink-by-space-then-newlines).

clemera commented 5 years ago

This looks very good, to me.

Could you give an example of this?

Sorry, I can't. Makes no sense to me now, too :D. I guess I misunderstood how you define your things, now I have read your docs I have a better picture.

One thing that came to my mind when reading is about mode specific things. With objed I noticed that it sometimes useful to be able to automatically dispatch to a mode specific version of a text-object/thing. Often that's not necessary but I have used it for defining different inner defun implementations, for example.

Other than that I have not much to add, it looks you already put a lot of thought into it. I hadn't much time to work on objed for a while but I will certainly try to experiment with things as soon as I get back to it. I guess when I'm trying to convert some of my text objects to things there will come up further questions.