trishume / syntect

Rust library for syntax highlighting using Sublime Text syntax definitions.
https://docs.rs/syntect
MIT License
1.88k stars 131 forks source link

Panics when a regex (from a `.sublime-syntax` file) includes unescaped backslash #547

Open Andrew15-5 opened 1 month ago

Andrew15-5 commented 1 month ago

Hello, I discovered a panic in Typst: https://github.com/typst/typst/issues/4421. I was able to recreate the panic and the use case of the library in Typst:

use syntect::{
    easy::HighlightLines,
    highlighting::ThemeSet,
    parsing::{SyntaxDefinition, SyntaxSetBuilder},
};

/// Content of `a.sublime-syntax` file.
static A_SUBLIME_SYNTAX: &str = r#"
%YAML 1.2
---
name: lang
file_extensions:
  - a
scope: source
contexts:
  main:
    - match: '\'
"#;

fn main() {
    let text = "text";
    let file_extension = "a";
    let theme = &ThemeSet::load_defaults().themes["base16-ocean.dark"];
    let syntax_set = {
        let mut out = SyntaxSetBuilder::new();
        out.add(SyntaxDefinition::load_from_str(A_SUBLIME_SYNTAX, false, None).unwrap());
        out.build()
    };
    let syntax = syntax_set.find_syntax_by_token(file_extension).unwrap();
    let _ = HighlightLines::new(syntax, theme).highlight_line(text, &syntax_set);
}
[dependencies]
syntect = "5.2.0"

[patch.crates-io]
syntect = { git = "https://github.com/trishume/syntect" }
thread 'main' panicked at /home/user/.local/share/cargo/git/checkouts/syntect-e33b39f181e4f0f4/d023aaa/src/parsing/regex.rs:70:53:
regex string should be pre-tested: Error(OnigError(-104), end pattern at escape)

https://github.com/trishume/syntect/blob/d023aaa509d9e5058d55f9aa787c88f9a74bb180/src/parsing/regex.rs#L70

There should be some way to get Result::Err from some function call to be able to show in stderr what exactly went wrong. This error is generally "invalid regex" or more specifically "invalid regex: backslash doesn't escape anything".

I was able to remove panic by validating the regex before searching:

https://github.com/trishume/syntect/blob/d023aaa509d9e5058d55f9aa787c88f9a74bb180/src/parsing/regex.rs#L64-L65

    pub fn search(
        &self,
        text: &str,
        begin: usize,
        end: usize,
        region: Option<&mut Region>,
    ) -> bool {
        if Regex::try_compile(self.regex_str()).is_none() {
            return false;
        }
        self.regex()
            .search(text, begin, end, region.map(|r| &mut r.region))
    }

But this wouldn't give away any Error. I also don't see any way to use Regex::try_compile() directly in Typst/MRA code, therefore some other way of validating all regexes in a .sublime-syntax file should be added.

keith-hall commented 1 month ago

Thanks for reporting. Interestingly, it is trying to compile the regex pattern when it loads the yaml file at https://github.com/trishume/syntect/blob/master/src/parsing/yaml_load.rs#L473. But, that spurious trailing slash is unfortunately being eaten in the line above. I think it can be fixed like:

diff --git a/src/parsing/syntax_definition.rs b/src/parsing/syntax_definition.rs
index d73c3e7..2bdee51 100644
--- a/src/parsing/syntax_definition.rs
+++ b/src/parsing/syntax_definition.rs
@@ -241,6 +241,9 @@ where

         last_was_escape = c == '\\' && !last_was_escape;
     }
+    if last_was_escape {
+        reg_str.push('\\')
+    }
     reg_str
 }
Andrew15-5 commented 1 month ago

But, that spurious trailing slash is unfortunately being eaten in the line above.

Indeed, removing the substitute_backrefs_in_regex() gives:

error: failed to parse syntax file `a.sublime-syntax` (Error while compiling regex '\': Parsing error at position 0: Backslash without following character)
  ┌─ a.typ:1:19
  │
1 │ #set raw(syntaxes: "a.sublime-syntax")
  │                    ^^^^^^^^^^^^^^^^^^

Or applying the patch gives the same error (duh).