Local File Transformations

bbb651 commented 1 month ago

Check for existing issues

[X] Completed

Describe the feature

I'll start by saying this is a very ambitious idea that I had for a long time that falls into "how would my hypothetical ideal text editor work" and adds quite a lot of complexity, but at the same time it's a really feature that seems useful and I haven't seem anywhere else, so I wanted to get it out there. This is also turning out to be the most detailed issue I've ever written.

Introduction

The core idea is to let the user install/configure Local File Transformations (LFTs), that modify the file that is displayed and edited compared to how it's stored on disk and seen by development tools.

Currently, the way files are saved and opened looks something like this (on the side of the machine where the files are in the case of remote):

flowchart LR
    subgraph Document Sharing
        Remote
        Collaboration
    end
    subgraph Development Tools
        Tree-Sitter
        Lsp
        Tasks
    end
    Remote <-.-> Doc
    Collaboration <-.-> Doc
    Doc[Document Model] -->|save| Fs[Filesystem]
    Fs -.->|open| Doc
    Fs -.-> Tree-Sitter
    Fs -.-> Lsp
    Fs -.-> Tasks

With this feature, it might look like this:

LFT - Local File Transformation ILFT - Inverse Local File Transformation

flowchart LR
    subgraph Document Sharing
        Remote
        Collaboration
    end
    subgraph Development Tools
        Tree-Sitter
        Lsp
        Tasks
    end
    Remote <-.-> Buffer
    Collaboration <-.-> Buffer
    Buffer[Transformed Buffer] -->|edit + ILFT| Doc
    Doc[Document Model] -->|save| Fs[Filesystem]
    Fs -->|open| Doc
    Doc -->|"LFT (after open)"| Buffer
    Fs -.-> Tree-Sitter
    Fs -.-> Lsp
    Fs -.-> Tasks

Implementation

LFTs are bijections that operate on the document.

To provide a good experience, LFTs should have the following properties:

Locality - inverse LFTs are applied frequently and always when a part of a document changes, and thus should only operate on the parts that changed.
Low Latency - this especially applies to inverse LFTs.

LFTs increase latency in two ways:

Opening latency - This increase the time taken to open/preview a file. This can be mostly mitigated by opening the file without LFTs as read-only until the LFTs finish applying, although it might make the file quickly flash.
Edit/Save latency - This increases the time it takes for the original file to see the edit, leading to increased latency with tree-sitter, lsps and watch compilation. This makes inverse LFTs a bottleneck for how fast the editor feels.

Examples

JSDoc Type Hints

This was the motivating example that give me this idea about a year ago, when Svelte switched to JSDoc. The main drawback of JSDoc are it's verbose and inconvenient syntax, but for certain usecases like libraries, the reduced complexity of saving a build step and having the source be easier to access is very appealing.

LFTs solve this well:

const foo: number = 1;

<===>

/** @type {number} */
const foo = 1;

this is a case where having access to the tree-sitter tree is very beneficial.

Indentation

If you think this example is silly and this is what tabs are for, I totally agree.

In an ideal world everyone would agree on indentation and noone would care about this example, and I think people mostly stopped caring, but for me the most annoying case is when a project uses spaces for indentation, which isn't resizable when rendering like tabs, and I don't agree with the amount of spaces (i.e. it's not 4). It's annoying to code with a indentation you're not used to, and if you change it you might accidentally commit indentation changes.

This is as simple as it can get:

const foo = {
  bar: true,
  baz: 2,
};

<===>

const foo = {
    bar: true,
    baz: 2,
};

This can be also be solved with git attributes as noted below.

Json Trailing Commas, Quoteless Keys & Comments

Trailing commas makes a lot sense for editing, they let you easily reorder lines and copy things around with ending up with missing commas and disallowed trailing commas, and might even slightly improve readability. There are formats like jsonc and json5 which allows them, but unfortunately most things only allow strict json which forbids trailing commas. Similarly, the requirement of quoting all keys even when unnecessary also makes editing cumbersome and reduces readability.

This can also apply to programming languages that don't support trailing commas in function arguments (looking at you lua).

Comments are problematic on a fundamental level - they need to be retained somewhere. Some schemas permit special keys that are ignored when parsing, these can map pretty well, here's an example that allows _comment on the root object:

// Comment above the object
{
    // Comment above a key
    foo: "bar",
    object: {
        // Comment inside another object
    },
    // Comment below all keys
}
// Comment below the object

<===>

{
    "_comment": {
        "this": "Comment above the object",
        "foo": "Comment above a key",
        "object.": "Comment inside another object",
        "": "Comment below all keys",
        "super.": "Comment below the object",
    }
    "foo": "bar",
    "object": {}
}

Indentation Based Scope

Let's have python like rust syntax, why not! (This is inspired by a procedural macro that I saw years ago but I cannot find it for the life of me, I've gone through multiple pages of google results and Claude gaslighting me with a non-existent piston-indentation crate)

fn main() {
    println!("Hello world!");
}

<==>

fn main():
    println!("Hello world!");

This is an intentionally hard example, because it's inherently non-local: to know how many tabs to insert, you need to keep track of the current scope level. This turns out not to really be an issue because LFTs are always applied to the whole document, and the inverse LFTs doesn't have the same problem because they only need to see relative change in indentation. An inverse example of this, python with curly brackets:

if __name__ == "__main__":
    println!("Hello world!");

<==>

if __name == "__main__" {
    println!("Hello world!");
}

does suffer from the issue of non-locally. This is a case where having access to the tree-stter tree can really be beneficial.

Hex Editor

Of course Zed will have a native hex editor at some point, but this is more customizable and might be useful for advanced usecases (e.g. support for niche vim operations), and it's a good example. This could be done with xxd and xxd -r.

Alternative: Git Attributes

I recently read the git documentation on git attributes and they provide a similar feature to LFTs, with the main difference being that the transformations happen between the git repository and the filesystem, instead of between the filesystem and the editor. This makes it much easier to do because the files only to be transformed when interacting with git, which is much less frequently than on every edit. The fundamental limitation with this approach is that your development tools, tree-sitter and lsps still operate on the transformed files meaning you cannot make any syntax incompatible changes, limiting you to only formatting (you might be able to with very niche cases, e.g. load javascript files as typescript and alias node to ts-node).

Alternative: FUSE Filesystems

This has the advantage of having many existing FUSE filesystems, but it has many downsides: it's linux only (maybe also macOS?), they are relatively hard to write, and while you might be able to get around syntax incompatible changes with development tools by placing the virtual filesystem in a different location and having the development tools use the original files, tree-sitter/lsp won't work.

Additional Idea: Virutal Filesystem

There is another feature that is very adjacent to this, which is virtual filesystems. These are commonly seen on linux with FUSE, and vscode has the FIleSystemProvider api that is cross platform and let's extensions register their own file systems, such as microsoft's remote development extension.

Virtual filesystems generally fall into 2 types:

Filesystems that are backed by a remote machine/resource, such as sshfs
Filesystems that are backed by a file/folder you provide them, such as gocryptfs Vscode's FileSystemProvider api is focused on the former one, with the filesystem provider having no good way to remember what file/folder it's backed by, and paths being provided as URIs that match the protocol you declared.

There is a lot of overlap between LFTs and file/folder backed filesystems, if you allow LFTs to optionally transform in addition to file -> file, also file/folder -> file/folder, they work as transparent file/folder backed virtual filesystems. There are many examples of useful virtual filesystems, e.g. directly editing tarred and/or compressed files that would be really useful to add to Zed on their own regardless of this feature. A cool example I thought of was recreating oil.nvim, which is a vim plugin that lets you edit folders like text files to create/delete/rename files in the folder.

Open Questions

[ ] How do tree-sitter/lsps work with the transformed file? This is seems like the toughest problem to solve about this entire endeavor, and I haven't put much thought to it so far. If the LFTs are simple text -> text transformations this doesn't give enough context to work with, so it likely necessitates the modifications to be done at the tree-sitter tree level for this to work. But then how do we map the valid untransformed to the potentially invalid transformed tree? Can we use the tree to approximately map cursor positions for the lsp? This is probably impossible to solve in the general case...
[ ] Should find in folder use transformed files? Probably not, it's can be an absurd number of files to run through LFTs.
[ ] How does this interact with tree-sitter? Should LFTs and inverse LFTs have access to the tree-sitter tree? Probably.
[ ] How does this interact with lsps? Should LFTs and inverse LFTs have access to the lsp? Probably not.
[ ] What format should LFTs be written in? Potential options are: shell commands (more compatible with existing tools and git attributes), wasm (makes it viable to distribute LFTs through/as extensions), both?
[ ] Should you be able to layer multiple LFTs together? Probably.

someone13574 commented 1 month ago

This. ~I really love the in-file rust type annotations from vscode's rust-analyzer extension and it is one of the major things blocking me from (fully) transitioning.~

bbb651 commented 1 month ago

This isn't related to that, what you are talking about is called inlay hints, they are entirely controlled by the language server and they already exist on Zed, there's a button to toggle them on the toolbar (you can also use the command palette, or their shortcut which is either ctrl + ; or ctrl + : I don't remember).

mrnugget commented 1 month ago

Have you ever looked at our DisplayMap? I think it's very similar to what you have in mind:

https://github.com/zed-industries/zed/blob/d21598efe91e041f259c63b16dfdbafacd2ca05d/crates/editor/src/display_map.rs#L1-L18

Folds, wrapping, tabs, highlighting, inlay hints — they are all modeled as a series of optional transformations on the buffer that change how it's displayed.

zed-industries / zed