noties / Prism4j

Simplified Java clone of prism-js
Apache License 2.0
51 stars 14 forks source link
java prismjs syntax-highlight syntax-highlighting

Prism4j

Simplified Java clone of prism-js. No rendering, no themes, no hooks, no plugins. But still a language parsing. Primary aim of this library is to provide a tokenization strategy of arbitrary syntaxes for later processing. Works on Android (great with Markwon - markdown display library).

Core

Core module prism4j is a lightweight module that comes with API (no language definitions).

prism4j

implementation "io.noties:prism4j:${prism_version}"
final Prism4j prism4j = new Prism4j(new MyGrammarLocator());
final Grammar grammar = prism4j.grammar("json");
if (grammar != null) {
    final List<Node> nodes = prism4j.tokenize(code, grammar);
    final AbsVisitor visitor = new AbsVisitor() {
            @Override
            protected void visitText(@NonNull Prism4j.Text text) {
                // raw text
                text.literal();
            }

            @Override
            protected void visitSyntax(@NonNull Prism4j.Syntax syntax) {
                // type of the syntax token
                syntax.type();
                visit(syntax.children());
            }
        };
    visitor.visit(nodes);
}

Where MyGrammarLocator can be as simple as:

public class MyGrammarLocator implements GrammarLocator {

    @Nullable
    @Override
    public Prism4j.Grammar grammar(@NonNull Prism4j prism4j, @NonNull String language) {
        switch (language) {

            case "json":
                return Prism_json.create(prism4j);

            // everything else is omitted

            default:
                return null;
        }
    }
}

And language definition:

import static java.util.regex.Pattern.CASE_INSENSITIVE;
import static java.util.regex.Pattern.compile;
import static io.noties.prism4j.Prism4j.grammar;
import static io.noties.prism4j.Prism4j.pattern;
import static io.noties.prism4j.Prism4j.token;

@Aliases("jsonp")
public class Prism_json {

  @NonNull
  public static Prism4j.Grammar create(@NonNull Prism4j prism4j) {
    return grammar(
      "json",
      token("property", pattern(compile("\"(?:\\\\.|[^\\\\\"\\r\\n])*\"(?=\\s*:)", CASE_INSENSITIVE))),
      token("string", pattern(compile("\"(?:\\\\.|[^\\\\\"\\r\\n])*\"(?!\\s*:)"), false, true)),
      token("number", pattern(compile("\\b0x[\\dA-Fa-f]+\\b|(?:\\b\\d+\\.?\\d*|\\B\\.\\d+)(?:[Ee][+-]?\\d+)?"))),
      token("punctuation", pattern(compile("[{}\\[\\]);,]"))),
      token("operator", pattern(compile(":"))),
      token("boolean", pattern(compile("\\b(?:true|false)\\b", CASE_INSENSITIVE))),
      token("null", pattern(compile("\\bnull\\b", CASE_INSENSITIVE)))
    );
  }
}

Bundler

In order to simplify adding language definitions to your project there is a special module called prism4j-bundler that will automatically add requested languages.

prism4j-bundler

annotationProcessor 'io.noties:prism4j-bundler:${prism_version}'

Please note that bundler can add languages that are ported (see ./languages folder for the list). Currently it supports:

Please see Contributing section if you wish to port a language.

@PrismBundle(
    includes = { "clike", "java", "c" },
    grammarLocatorClassName = ".MyGrammarLocator"
)
public class MyClass {}

You can have multiple language bundles, just annotate different classes in your project. There are no special requirements for a class to be annotated (in can be any class in your project).

!important

NB generated GrammarLocator will create languages when they are requested (aka lazy loading). Make sure this works for you by keeping as is or by manually triggering language creation via prism4j.grammar("my-language"); when convenient at runtime.

Contributing

If you want to contribute to this project porting grammar definitions would be the best start. But before you begin please create an issue with language-support tag so others can see that a language is being worked at. This issue will be also the great place to discuss things that could arise whilst in process.

Language definitions are at the /languages folder (go down the io.noties.prism4j.languages package to find the files). A new file should follow simple naming convention: Prism_{real_language_name}.java. So, a definition for the json would be Prism_json.java.

In order to provide bundler with meta-information about a language @Aliases, @Extend and @Modify annotations can be used:

@Extend("clike")
public class Prism_c {}
@Modify("markup")
public class Prism_css {}

@Modify accepts an array of language names


After you are done (haha!) with a language definition please make sure that you also move test cases from prism-js for the project (for newly added language of cause). Thankfully just a byte of work required here as prism4j-languages module understands native format of prism-js test cases (that are ending with *.test). Please inspect test folder of the prism4j-languages module for further info. In short: copy test cases from prism-js project (the whole folder for specific language) into prism4j-languages/src/test/resources/languages/ folder.

Then, if you run:

./gradlew :prism4j-languages:test

and all tests pass (including your newly added), then it's safe to issue a pull request. Good job!

Important note about regex for contributors

As this project wants to work on Android, your regex's patterns must have } symbol escaped (\\}). Yes, an IDE will warn you that this escape is not needed, but do not believe it. Pattern just won't compile at runtime (Android). I wish this could be unit-tested but unfortunately Robolectric compiles just fine (no surprise actually).

License

  Copyright 2019 Dimitry Ivanov (legal@noties.io)

  Licensed under the Apache License, Version 2.0 (the "License");
  you may not use this file except in compliance with the License.
  You may obtain a copy of the License at

      http://www.apache.org/licenses/LICENSE-2.0

  Unless required by applicable law or agreed to in writing, software
  distributed under the License is distributed on an "AS IS" BASIS,
  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
  See the License for the specific language governing permissions and
  limitations under the License.