New Language Support Request: Ini, Conf, Property, etc. + New LineLexer utility function

GoogleCodeExporter commented 9 years ago

Love this extension. I use it for syntax highlighting on my company wiki since 
the built-in code module only supports a couple languages and then supports 
them poorly. I needed to document some linux conf files (which basically use 
the ini syntax) and noticed that the format wasn't supported, so I set out to 
write my own.

I had a hard time using the "simple" lexer. Line breaks are significant in INI 
files so I set out to write my own custom function and then packaged it in a 
reusable format. This doesn't work the same as the simple lexer.. but it suits 
my needs perfectly and I could see it being useful for other similar 
languages.. figured I'd share it. Feel free to butcher it.. add things to it.. 
make it conform more to the way you wrote the simple lexer.. rename it, even.. 
or just discard it if you don't care for it. I tried to write it following your 
coding style. Here ya go:

// The line lexer takes an array of patterns.
// Each pattern starts with a regular expression followed by a syntax color for 
each captured set in the expression.
// For instance, you might capture a key followed by an operator followed by a 
value. So after that regular expression,
//   there would be the syntax colors for the key, operator, and value (in 
order).
// Anything that's not captured is then given the PR_PLAIN syntax color.
// Patterns must be designed to match whole lines and they should be listed in 
order of precedence, so once a pattern
//   matches a line, that line no longer checks itself against other patterns.
function createLineLexer(patterns) {
  return function(job) {
    var lines = job.sourceCode.match(/[^\r\n]*[\r\n]+/g),
      pos = job.basePos,
      decorations = [ pos, PR['PR_PLAIN'] ],
      li, nLines = lines.length,
      line,
      pi, nPatterns = patterns.length,
      pattern,
      mi, nMatches, matches, match, mOffset, mIndex;
    // Iterate each line
    for ( li = 0; li < nLines; ++li ) {
      line = lines[li];
      // Iterate each pattern, seeing if the line matches the pattern
      for ( pi = 0; pi < nPatterns; ++pi ) {
        pattern = patterns[pi];
        matches = line.match(pattern[0]);
        if ( matches ) {
          nMatches = matches.length;
          mOffset = 0;
          // Iterate each captured set
          for ( mi = 1; mi < nMatches; ++mi ) {
            match = matches[mi];
            // Sets are captured in order, so we can find the position of the capture using indexOf with an offset
            mIndex = line.indexOf(match, mOffset);
            // This should never fail since the regex was passed, but just in case, ensure that we found the index
            if ( mIndex > -1 ) {
              mOffset = mIndex + match.length;
              // Add the decorator. Use the PR_PLAIN styling for the text following the capture
              decorations.push(pos + mIndex, pattern[mi]);
              decorations.push(pos + mIndex + match.length, PR['PR_PLAIN']);
            }
          }
          break;
        }
      }
      // Increment the pos, moving on to the next line
      pos += line.length;
    }
    // Send the decorations back with in the job
    job.decorations = decorations;
  }
};

Then here's the language handler that's using it:

PR['registerLangHandler'](
  createLineLexer([
    // If a line starts with a semicolon or hash as the first non-whitespace character then the whole line is a comment
    [ /^\s*([;#].+)/, PR['PR_COMMENT'] ],
    // If the first non-whitespace character is a left bracket and it ends with a right bracket then it's a section marker
    [ /^\s*(\[[^\]]*\])\s*$/, PR['PR_KEYWORD'] ],
    // Attributes start with a name followed by an equals sign or colon and then everything after is the value (trimmed)
    [ /^\s*(\w?.*\w)\s*([=:])\s*(.+)/, PR['PR_ATTRIB_NAME'], PR['PR_PUNCTUATION'], PR['PR_PLAIN'] ]
  ]),
  [ 'conf', 'ini', 'property' ]
);

Here's HTML to test it:

<pre class="prettyprint">
<code class="language-ini">
line1: blah
   ; hey

[gsidev]
    comment = All my dev stuff
    path = /gsidev # some comment
    force user = gsidevas
    guest ok = yes
    writeable = yes
    browseable = yes

[other]
;   path: test
    url: http://blah.com

; last modified 1 April 2001 by John Doe
[owner]
name=John Doe
organization=Acme Widgets Inc.

[database]
; use IP address in case network name resolution is not working
server=192.0.2.62     
port=143
file = "payroll.dat"

</code>
</pre>

I've also attached a file that includes all this source code along with the 
prettyprint lib so just open the html file and you can see it in action.

Original issue reported on code.google.com by michaeld...@gmail.com on 20 Jun 2012 at 2:37

Attachments:

prettyPrint.html

GoogleCodeExporter commented 9 years ago

Of course no sooner do I post this that I find a bug. If the last line doesn't 
end with a line break then it doesn't get colored.. and apparently the url line 
test didn't work and I'm blind.

So simply, change the var lines line to:
var lines = job.sourceCode.match(/[^\r\n]*([\r\n]+|$)/g),

Change the attribute pattern to:
[ /^\s*([^:=]+)\s*([=:])\s*(.+)/, PR['PR_ATTRIB_NAME'], PR['PR_PUNCTUATION'], 
PR['PR_PLAIN'] ]

And change the end of the test block to:
url: http://blah.com</code>
</pre>

I've attached a file with the updates.

Original comment by michaeld...@gmail.com on 20 Jun 2012 at 3:11

Added labels: ****
Removed labels: ****

Attachments:

prettyPrint.html

GoogleCodeExporter commented 9 years ago

I think the comment regex should be:

    [ /^\s*([;#][^\n]*)/, PR['PR_COMMENT'] ],

Or else you don't match lines with just a hash character.

Original comment by bolinf...@gmail.com on 24 Sep 2014 at 9:50

Added labels: ****
Removed labels: ****

h13i32maru / google-code-prettify

New Language Support Request: Ini, Conf, Property, etc. + New LineLexer utility function #223