trishume / syntect

Rust library for syntax highlighting using Sublime Text syntax definitions.
https://docs.rs/syntect
MIT License
1.85k stars 130 forks source link

Unclear licensing & provenance for bundled assets #301

Open quasicomputational opened 3 years ago

quasicomputational commented 3 years ago

AFAICT, syntect doesn't come with any information about the licensing of the bundled themes and syntax definitions, and the provenance & attribution are also unclear to end users, requiring digging in git history and chasing submodule references to find out where things come from.

I'm not sure what the best way to fix this would be, and there are at least three perspectives to look at this from:

  1. syntect as source-code distribution, which will always include the pre-built assets;
  2. syntect as library, which may or may not include the assets in the artifact depending on flags;
  3. the output of things like css_for_theme or otherwise serialising a theme, which will be subject only to that specific theme's license.

This is a bit of a nuisance; sorry about that.

slimsag commented 3 years ago

Over at Sourcegraph we've been using Syntect for several years and I can say the following (but I am not a lawyer):

I say "Common License" above because it's a really common, MIT-like license which appears to be used in almost all TextMate language grammars. It also is commercially compatible, since Sublime distributes them with its product Sublime Text.

If you go outside sublimehq/Packages and pull in more third-party syntax definitions, for example, you will find this license is very common as I have: https://github.com/slimsag/Packages#license

the output of things like css_for_theme or otherwise serialising a theme, which will be subject only to that specific theme's license.

Generally speaking (again I am not a lawyer and this isn't legal advice), programs which transform work fall under derivative work laws and are not usually subject to the same license as the actual thing that produce them or played a part in producing them. This is why, for example, models made in Blender 3D are not GPL-licensed, and images produced in Photoshop are not owned by Adobe.

I would conclude that:

  1. In general you can assume any output produced by Syntect is licensed under the same terms as the input file itself (i.e. if you're highlighting code, the result is mostly identical to the code itself and therefor under the same license).
  2. Syntect itself is mostly MIT licensed, and MIT-compatible including all assets/themes/syntaxes.
decathorpe commented 1 month ago

I'm deep into a rabbit hole and just found this issue. When trying to add a package for the two-face crate to Fedora Linux (which provides additional themes and syntax highlighting grammars for syntect), I noticed that when syntect was initially packaged, we didn't account for the bundled themes and syntax highlighting grammars.

I can't find any way to regenerate the asset bundles from scratch. Is this documented somewhere? Shipping binary blobs that we have no way to verify or recreate when necessary is a recipe for disaster (see the recent XZ backdoor). I'm not saying that anything nefarious is going on here, just that it doesn't look good.

Comparing the list of included "default" grammars with other projects, it looks like the bundled grammars are from https://github.com/sublimehq/Packages at some point in time, but I can't find a reference to which point in time. Red Hat has reviewed the license that is attached to first-party Sublime grammars, and has determined that while it's a non-standard license, it's very permissive and safe to use and redistribute, but some grammars have other licenses. For example, the Rust grammar is MIT-licensed - this is not a problem in itself, because the syntect crate itself is MIT-licensed.

The list of "default" themes is unknown to me, and I can't tell where they are included from. The "InspiredGithub" theme links to a GitHub project that is MIT-licensed (which is fine), but I can't determine any origin for the other included themes (base16-*, Solarized*). It would be great if their origin (and their respective licenses) could be documented.

Heck, I even had to write Rust code to dump the list of both default grammars and themes from the built-in binary blobs, because they're not documented anywhere ...

// Cargo.toml: dependencies.syntect = { version "5", features = ["default-syntaxes", "default-themes"] }

fn main() {
    let defaults = syntect::parsing::SyntaxSet::load_defaults_newlines();
    let mut syntaxes = defaults.syntaxes().iter().map(|s| s.name.clone()).collect::<Vec<_>>();
    syntaxes.sort();
    println!("Syntaxes: {:#?}", syntaxes);

    let defaults = syntect::highlighting::ThemeSet::load_defaults();
    let mut themes = defaults.themes.keys().collect::<Vec<_>>();
    themes.sort();
    println!("Themes: {:#?}", themes);
}
keith-hall commented 1 month ago

it looks like the bundled grammars are from https://github.com/sublimehq/Packages at some point in time, but I can't find a reference to which point in time.

that can easily be ascertained by looking at the submodule reference. GitHub shows both the date and commit hash. image

The list of "default" themes is unknown to me, and I can't tell where they are included from. The "InspiredGithub" theme links to a GitHub project that is MIT-licensed (which is fine), but I can't determine any origin for the other included themes (base16-*, Solarized*).

The same answer kind of applies here - just with the additional step of browsing the submodules to see what .sublime-syntax and .tmTheme files they contain...

I can't find any way to regenerate the asset bundles from scratch.

it doesn't seem to be explicitly documented outside of the file which creates the binary blobs. https://github.com/trishume/syntect/blob/d023aaa509d9e5058d55f9aa787c88f9a74bb180/examples/gendata.rs#L1-L7 But the Makefile is always a good place to look and it is run from CI

decathorpe commented 1 month ago

Thank you for the pointers! Neither "test-data" (data is ... not (only?) used for tests?) nor "examples" (it's the code that actually generates the blobs?) are places that I would have expected ... the Makefile is indeed helpful.