Open quasicomputational opened 4 years ago
Over at Sourcegraph we've been using Syntect for several years and I can say the following (but I am not a lawyer):
I say "Common License" above because it's a really common, MIT-like license which appears to be used in almost all TextMate language grammars. It also is commercially compatible, since Sublime distributes them with its product Sublime Text.
If you go outside sublimehq/Packages and pull in more third-party syntax definitions, for example, you will find this license is very common as I have: https://github.com/slimsag/Packages#license
the output of things like css_for_theme or otherwise serialising a theme, which will be subject only to that specific theme's license.
Generally speaking (again I am not a lawyer and this isn't legal advice), programs which transform work fall under derivative work laws and are not usually subject to the same license as the actual thing that produce them or played a part in producing them. This is why, for example, models made in Blender 3D are not GPL-licensed, and images produced in Photoshop are not owned by Adobe.
I would conclude that:
I'm deep into a rabbit hole and just found this issue. When trying to add a package for the two-face
crate to Fedora Linux (which provides additional themes and syntax highlighting grammars for syntect), I noticed that when syntect
was initially packaged, we didn't account for the bundled themes and syntax highlighting grammars.
I can't find any way to regenerate the asset bundles from scratch. Is this documented somewhere? Shipping binary blobs that we have no way to verify or recreate when necessary is a recipe for disaster (see the recent XZ backdoor). I'm not saying that anything nefarious is going on here, just that it doesn't look good.
Comparing the list of included "default" grammars with other projects, it looks like the bundled grammars are from https://github.com/sublimehq/Packages at some point in time, but I can't find a reference to which point in time. Red Hat has reviewed the license that is attached to first-party Sublime grammars, and has determined that while it's a non-standard license, it's very permissive and safe to use and redistribute, but some grammars have other licenses. For example, the Rust
grammar is MIT-licensed - this is not a problem in itself, because the syntect crate itself is MIT-licensed.
The list of "default" themes is unknown to me, and I can't tell where they are included from. The "InspiredGithub" theme links to a GitHub project that is MIT-licensed (which is fine), but I can't determine any origin for the other included themes (base16-*
, Solarized*
). It would be great if their origin (and their respective licenses) could be documented.
Heck, I even had to write Rust code to dump the list of both default grammars and themes from the built-in binary blobs, because they're not documented anywhere ...
// Cargo.toml: dependencies.syntect = { version "5", features = ["default-syntaxes", "default-themes"] }
fn main() {
let defaults = syntect::parsing::SyntaxSet::load_defaults_newlines();
let mut syntaxes = defaults.syntaxes().iter().map(|s| s.name.clone()).collect::<Vec<_>>();
syntaxes.sort();
println!("Syntaxes: {:#?}", syntaxes);
let defaults = syntect::highlighting::ThemeSet::load_defaults();
let mut themes = defaults.themes.keys().collect::<Vec<_>>();
themes.sort();
println!("Themes: {:#?}", themes);
}
it looks like the bundled grammars are from https://github.com/sublimehq/Packages at some point in time, but I can't find a reference to which point in time.
that can easily be ascertained by looking at the submodule reference. GitHub shows both the date and commit hash.
The list of "default" themes is unknown to me, and I can't tell where they are included from. The "InspiredGithub" theme links to a GitHub project that is MIT-licensed (which is fine), but I can't determine any origin for the other included themes (
base16-*
,Solarized*
).
The same answer kind of applies here - just with the additional step of browsing the submodules to see what .sublime-syntax
and .tmTheme
files they contain...
I can't find any way to regenerate the asset bundles from scratch.
it doesn't seem to be explicitly documented outside of the file which creates the binary blobs. https://github.com/trishume/syntect/blob/d023aaa509d9e5058d55f9aa787c88f9a74bb180/examples/gendata.rs#L1-L7 But the Makefile is always a good place to look and it is run from CI
Thank you for the pointers! Neither "test-data" (data is ... not (only?) used for tests?) nor "examples" (it's the code that actually generates the blobs?) are places that I would have expected ... the Makefile is indeed helpful.
AFAICT, syntect doesn't come with any information about the licensing of the bundled themes and syntax definitions, and the provenance & attribution are also unclear to end users, requiring digging in git history and chasing submodule references to find out where things come from.
I'm not sure what the best way to fix this would be, and there are at least three perspectives to look at this from:
css_for_theme
or otherwise serialising a theme, which will be subject only to that specific theme's license.This is a bit of a nuisance; sorry about that.