Open joshgoebel opened 4 years ago
In the following example, the prefixed names schema:name
and xsd:string
get marked up the same way:
<PersonShape> {
schema:name xsd:string ;
}
The production for everything inside the '{}'s (tripleExpression
) contains the production for prefixedName
productions.tripleExpression = {
begin: common.iris_RE,
end: common.EndOfDocument,
returnBegin: true,
endsWithParent: true,
keywords: shapeExpression_keywords,
contains: [productions.IRIREF, productions.prefixedName].concat(shapeExprContentModel),
relevance: 0
}
so if it matches twice, e.g. on "schema:name" and "xsd:string", it gets the same annotation.
The highlighter could help the user by marking them differently but I haven't figured out how to say "after matching IRIREF
or prefixedName
once, switch to production X" (which can match them again, but with different classNames). In the ace ShExC mode (see demo), an IRI in a tripleExpression dives into a nested production which (with some digging) annotates an IRI with datatype
. The demo only bolds datatypes, but that just reflects my lack of CSS creativity.
Yeah, this isn't intuitive... this is the kind of thing callbacks would help with. But you can do it now, just it's messy.
You have to chain rules. You can do that with either a parent/child relationships and playing with the rule terminations or it's probably simpler with starts
.
So to match say two terms "abc xyy" with a match that treats the 2nd one differently:
// rough pseudo-code
{ // rule "one"
begin: \w{3}
className: "termOne",
// since there is no end, will immediate end and starts will be triggered
// or you can eat extra spacing, etc with end and excludeEnd
starts: {
contains: [{
// our second term
begin: \w{3}
className: "termTwo"
// rule also immediate ends, control returns to it's parent (the starts mode), which immediately ends
// also then control returns to the parent of rule "one"
// actually you might need `endsParent` here to prevent matching more than one termTwo
}]
}
Hopefully that gives you the idea. If you had to do this all the time you'd write a helper for it.
IE, the only way to track grammar now is the mode tree. (and only very simple context at that)
Edit: updated code below to make it work, look for end: /\B\b/
.
i tried this doc in extra:
<html>
<head>
<title>context-sensitive grammar in highlightjs</title>
<link rel="stylesheet" href="../../build/styles/default.min.css"/>
<style>
.hljs-termOne { color: red; }
.hljs-termTwo { color: blue; }
</style>
</head>
<body>
<pre><code class="toy">
one two one two one two
</code></pre>
<script src="../../src/highlight.js"></script>
<script>
hljs.registerLanguage("toy", function () {
return function (hljs, options = {}) {
return {
contains: [
{
className: "termOne",
begin: /\w{3}/,
starts: {
end: /\B\b/, // Added following @yyyc514's advice below
contains: [{
className: "termTwo",
begin: /\w{3}/,
endsParent: true
}]
}
}
]
}
}
}());
hljs.initHighlightingOnLoad();
</script>
</body>
</html>
but only got termOne
s (red):
<code class="toy hljs">
<span class="hljs-termOne">one</span> <span class="hljs-termOne">two</span> <span class="hljs-termOne">one</span> <span class="hljs-termOne">two</span> <span class="hljs-termOne">one</span> <span class="hljs-termOne">two</span>
</code>
Any advice?
Edit: now works.
Child modes MUST match something or they will end (since you didn't specify end
anywhere it'll always try to end FAST). You starts mode isn't matching the space, so it ends because the second term doesn't immediately follow the first.
You need a rule to eat spaces. Or you could change the end
rule of your starts
block to be a non-match... ie a regex that can't possible match anything, I looked that up the other day but can't recall off the top of my head.
Of course then you need to be SURE a 2nd term is following or you'll get stuck. So spaces might be a better approach.
added end: /\B\b/
to make it work, tx!
Will keep the issue open until i apply this tech to the ShExC and maybe SPARQL and Turtle grammars in the multi-lang
branch.
Then if you really want to get fancy you can whip up a helper to build those objects for you (if you needed the pattern often):
requireSequence([
{ className: "termOne", begin: /\w{3}/ },
{ className: "termTwo", begin: /\w{3}/ }
]
I guess you'd have to think of a way to encode the "allow spaces or not" type info...
I'd really love to allow naming of sub-match expressions (like you see in Textmate grammars and such) but JS has no way to pull location data from them...
i'm trying to make a production that's indirectly called by starts
consume until it hits a closing delimiter, but it seems to return as soon as any match is made. Any tricks to get around thati?
https://github.com/highlightjs/highlightjs-shexc/commit/595f7fdec64ca288418acc7c060ca79d9e9625ea
I'm not sure I follow what you're trying to do. and I'm not sure self works with starts, that seems strange to me. If [] is recursive then self belongs inside contains
, not start.
So once value opens it shouldn't close until ]
is found... and if it's closing prematurely due to a SECOND second of brackets then typically you handle that with contains: [self]
.
You can add ADDITIONAL end matches inside contains with endsParent
...
When trying to build up something complex the best thing to do is get it workign with the simplest possible case, then commit that, then slowly add additional tests cases one by one and expand the matches each time.
It might be easier to see a failing markup test of what you're trying to accomplish.
Could you provide a quick example of what you're talking about here? While it might be true it's not (currently) easy to do you should be able to define a sub-mode that's derives from the parent mode but is different... so something nested inside a block could be highlighted differently than the same thing would if it were outside the block.
But perhaps that's not what your'e getting at here at all.