griimick / vscode-vhs

Visual Studio Code extension providing syntax support for VHS .tape files.
https://marketplace.visualstudio.com/items/griimick.vhs
MIT License
16 stars 2 forks source link

Generate TextMate grammar from Tree-sitter grammar #2

Open griimick opened 1 year ago

griimick commented 1 year ago

The maintenance efforts will reduce drastically if we can generate TextMate grammar which is used by VSCode from VHS official tree-sitter grammar.

Resources

  1. Tree-sitter Parser and Grammer
  2. Writing a TextMate Grammar: Some Lessons Learned by Matt Neuburg
  3. VSCode Syntax Highlight guide
  4. VSCode Language Extension Overview
  5. Lexer and Parser Generators in Scheme
uncenter commented 11 months ago

I spent a bit searching for a proper converter between the two looks like it's a hard task since there isn't a single one. I have a few ideas though for automatically updating parts of it and I'll create a draft PR in the moment if you want to give some thoughts.

griimick commented 11 months ago

Thanks for your interest. Yes, it will be a bit tricky to write a generic tree-sitter to TextMate grammar converter. Anyways feel free to open a draft PR if you have something to share.

uncenter commented 11 months ago

I'm just gonna give my thoughts here before I commit to any coding.

The https://github.com/charmbracelet/tree-sitter-vhs/blob/main/src/grammar.json file is easy to parse and contains a lot of information that we can scrape.

For example, the rules.setting section:

"setting": {
  "type": "CHOICE",
  "members": [
    {
      "type": "SEQ",
      "members": [
        {
          "type": "STRING",
          "value": "Shell"
        },
        {
          "type": "SYMBOL",
          "name": "string"
        }
      ]
    },
    {
      "type": "SEQ",
      "members": [
        {
          "type": "STRING",
          "value": "FontFamily"
        },
        {
          "type": "SYMBOL",
          "name": "string"
        }
      ]
    },
    {
      "type": "SEQ",
      "members": [
        {
          "type": "STRING",
          "value": "FontSize"
        },
        {
          "type": "SYMBOL",
          "name": "float"
        }
      ]
    },
    {
      "type": "SEQ",
      "members": [
        {
          "type": "STRING",
          "value": "Framerate"
        },
        {
          "type": "SYMBOL",
          "name": "integer"
        }
      ]
    },
    {
      "type": "SEQ",
      "members": [
        {
          "type": "STRING",
          "value": "PlaybackSpeed"
        },
        {
          "type": "SYMBOL",
          "name": "float"
        }
      ]
    },
    {
      "type": "SEQ",
      "members": [
        {
          "type": "STRING",
          "value": "Height"
        },
        {
          "type": "SYMBOL",
          "name": "integer"
        }
      ]
    },
    {
      "type": "SEQ",
      "members": [
        {
          "type": "STRING",
          "value": "LetterSpacing"
        },
        {
          "type": "SYMBOL",
          "name": "float"
        }
      ]
    },
    {
      "type": "SEQ",
      "members": [
        {
          "type": "STRING",
          "value": "TypingSpeed"
        },
        {
          "type": "SYMBOL",
          "name": "time"
        }
      ]
    },
    {
      "type": "SEQ",
      "members": [
        {
          "type": "STRING",
          "value": "LineHeight"
        },
        {
          "type": "SYMBOL",
          "name": "float"
        }
      ]
    },
    {
      "type": "SEQ",
      "members": [
        {
          "type": "STRING",
          "value": "Padding"
        },
        {
          "type": "SYMBOL",
          "name": "float"
        }
      ]
    },
    {
      "type": "SEQ",
      "members": [
        {
          "type": "STRING",
          "value": "Theme"
        },
        {
          "type": "CHOICE",
          "members": [
            {
              "type": "SYMBOL",
              "name": "json"
            },
            {
              "type": "SYMBOL",
              "name": "string"
            }
          ]
        }
      ]
    },
    {
      "type": "SEQ",
      "members": [
        {
          "type": "STRING",
          "value": "LoopOffset"
        },
        {
          "type": "SEQ",
          "members": [
            {
              "type": "SYMBOL",
              "name": "float"
            },
            {
              "type": "CHOICE",
              "members": [
                {
                  "type": "STRING",
                  "value": "%"
                },
                {
                  "type": "BLANK"
                }
              ]
            }
          ]
        }
      ]
    },
    {
      "type": "SEQ",
      "members": [
        {
          "type": "STRING",
          "value": "Width"
        },
        {
          "type": "SYMBOL",
          "name": "integer"
        }
      ]
    },
    {
      "type": "SEQ",
      "members": [
        {
          "type": "STRING",
          "value": "BorderRadius"
        },
        {
          "type": "SYMBOL",
          "name": "integer"
        }
      ]
    },
    {
      "type": "SEQ",
      "members": [
        {
          "type": "STRING",
          "value": "Margin"
        },
        {
          "type": "SYMBOL",
          "name": "integer"
        }
      ]
    },
    {
      "type": "SEQ",
      "members": [
        {
          "type": "STRING",
          "value": "MarginFill"
        },
        {
          "type": "SYMBOL",
          "name": "string"
        }
      ]
    },
    {
      "type": "SEQ",
      "members": [
        {
          "type": "STRING",
          "value": "WindowBar"
        },
        {
          "type": "SYMBOL",
          "name": "string"
        }
      ]
    },
    {
      "type": "SEQ",
      "members": [
        {
          "type": "STRING",
          "value": "WindowBarSize"
        },
        {
          "type": "SYMBOL",
          "name": "integer"
        }
      ]
    }
  ]
}

Super easy to extract from:

#!/usr/bin/env node

const data = require('./tree-sitter.json');

let settings = [];
for (const setting of data.rules.setting.members) {
    settings.push(setting.members[0].value)
}
console.log(settings)
[
  'Shell',         'FontFamily',
  'FontSize',      'Framerate',
  'PlaybackSpeed', 'Height',
  'LetterSpacing', 'TypingSpeed',
  'LineHeight',    'Padding',
  'Theme',         'LoopOffset',
  'Width',         'BorderRadius',
  'Margin',        'MarginFill',
  'WindowBar',     'WindowBarSize'
]

We don't have to write something like this for every little bit, but it could be a good way to easily update some parts. A workflow that runs once a week could check if anything has changed and update it automatically.

griimick commented 11 months ago

The file https://github.com/charmbracelet/tree-sitter-vhs/blob/main/src/grammar.json gets generated from https://github.com/charmbracelet/tree-sitter-vhs/blob/main/grammar.js. I think we can use the later to generate TextMate grammar.

uncenter commented 11 months ago

The file charmbracelet/tree-sitter-vhs@main/src/grammar.json gets generated from charmbracelet/tree-sitter-vhs@main/grammar.js. I think we can use the later to generate TextMate grammar.

I noticed but it seems harder to scrape/generate it from a JS file... I'll take another look.

griimick commented 11 months ago

You don't have to scrape it, think of how this file must be getting used by tree-sitter itself to generate the resultant json.

Can we override the global functions used in the grammar.js file like project, seq, choice, repeat, choice, etc and use the same file to generate TextMate grammar instead of tree-sitter grammar?

uncenter commented 11 months ago

You don't have to scrape it, think of how this file must be getting used by tree-sitter itself to generate the resultant json.

Can we override the global functions used in the grammar.js file like project, seq, choice, repeat, choice, etc and use the same file to generate TextMate grammar instead of tree-sitter grammar?

Totally, that's why I said scrape/generate. The only issue I'm noticing is just naming certain patterns and rulesets. I'll give it a go tonight and see what gives.

uncenter commented 11 months ago

I'm gonna be honest this is pretty difficult. A lot of it has to be hard-coded into the functions and it might honestly be easier to just do it by hand.

module.exports = grammar({
  name: 'vhs',
  rules: {
    program: $ => repeat(choice($.command, $.comment)),
    command: $ => choice(
      $.control,
      $.alt,
      $.hide,
      $.show,
      $.output,
      $.sleep,
      $.type,
      $.backspace,
      $.down,
      $.enter,
      $.escape,
      $.left,
      $.right,
      $.set,
      $.space,
      $.tab,
      $.up,
      $.pageup,
      $.pagedown,
    ),

    control: $ =>   /Ctrl\+[A-Z]/,
    alt: $ =>       /Alt\+[A-Z]/,
    hide: $ =>      seq('Hide'),
    show: $ =>      seq('Show'),
    output: $ =>    seq('Output',    $.path),
    set: $ =>       seq('Set',       $.setting),
    sleep: $ =>     seq('Sleep',     $.time),
    type: $ =>      seq('Type',      optional($.speed), repeat1($.string)),
    backspace: $ => seq('Backspace', optional($.speed), optional($.integer)),
    down: $ =>      seq('Down',      optional($.speed), optional($.integer)),
    enter: $ =>     seq('Enter',     optional($.speed), optional($.integer)),
    escape: $ =>    seq('Escape',    optional($.speed), optional($.integer)),
    left: $ =>      seq('Left',      optional($.speed), optional($.integer)),
    right: $ =>     seq('Right',     optional($.speed), optional($.integer)),
    space: $ =>     seq('Space',     optional($.speed), optional($.integer)),
    tab: $ =>       seq('Tab',       optional($.speed), optional($.integer)),
    up: $ =>        seq('Up',        optional($.speed), optional($.integer)),
    pageup: $ =>    seq('PageUp',    optional($.speed), optional($.integer)),
    pagedown: $ =>  seq('PageDown',  optional($.speed), optional($.integer)),

    setting: $ => choice(
      seq('Shell',         $.string),
      seq('FontFamily',    $.string),
      seq('FontSize',      $.float),
      seq('Framerate',     $.integer),
      seq('PlaybackSpeed', $.float),
      seq('Height',        $.integer),
      seq('LetterSpacing', $.float),
      seq('TypingSpeed',   $.time),
      seq('LineHeight',    $.float),
      seq('Padding',       $.float),
      seq('Theme',         choice($.json, $.string)),
      seq('LoopOffset',    seq($.float, optional('%'))),
      seq('Width',         $.integer),
      seq('BorderRadius',  $.integer),
      seq('Margin',        $.integer),
      seq('MarginFill',    $.string),
      seq('WindowBar',     $.string),
      seq('WindowBarSize', $.integer),
    ),

    string: $ =>  choice(/"[^"]*"/, /'[^']*'/, /`[^`]*`/),
    comment: $ => /#.*/,
    float: $ =>   /\d*\.?\d+/,
    integer: $ => /\d+/,
    json: $ =>    /\{.*\}/,
    path: $ =>    /[\.\-\/A-Za-z0-9%]+/,
    speed: $ =>   seq('@', $.time),
    time: $ =>    /\d*\.?\d+m?s?/,
  }
});

There all of the types (string, comment, float, integer, json, path, speed, time) along with other properties like setting, we would also have to ignore things like rules.program that just have no relation, and more (let alone the workarounds I had to use to get it to run properly). It just honestly seems easier to parse select bits of the JSON or do it by hand. LMK your thoughts.

griimick commented 11 months ago

No worries. Thanks for looking into this @uncenter and I really appreciate you spending time on this.

I know this is a bit tricky. We can definitely do this by parsing the JSON and we are already maintaining this repo by hand.

I see this as a coding exercise and want to solve this by writing good enough parser. Let me look into this and come up with a small writeup on how can this be achieved, maybe add in some example code. If it looks achievable, maybe you can pickup from there.

This can become a good learning experience for both of us, if you are up for it :)

uncenter commented 11 months ago

Totally! I would love to figure this out I'm just totally stumped/lost.

griimick commented 11 months ago

That's a good place to be when solving a problem. Give me some time and I will share my thoughts here.

uncenter commented 11 months ago

Do you have an quicker way of contacting you than than this issue, like Discord or Matrix?

griimick commented 11 months ago

I gave it a shot here: https://github.com/griimick/vscode-vhs/blob/treesitter-textmate/generate.js

I found out that tokens generated by tree-sitter grammar are less detailed compared to TextMate grammar in this repo. Tree-sitter token also do not directly map to the highlight definitions directly.

Also, TextMate uses Ruby regex which I don't think can be always converted to from js Regex as they are incompatible.

Knowing all this, I am inclined to maintain the rules manually now. If someone still wants to give it a shot, feel free.

uncenter commented 11 months ago

Exactly my thinking. At least we tried 😅...