jmzambon / libreoffice-code-highlighter

Code snippet highlighter for LibreOffice.
https://extensions.libreoffice.org/en/extensions/show/5814
Other
79 stars 5 forks source link

Explain Document Change #5

Closed flywire closed 2 years ago

flywire commented 2 years ago

What changes does Code Highlighter 2 make to the Writer files? I assume it hard-formats the code color rather than applying styles. Please explain in README.md.

Code Colorizer Formatter as a comparison:

I use specific paragraph styles for my code examples. The currently supported paragraph styles are "_OOoComputerCode", "_OOoComputerCodeInTable", "_OOoComputerCodeLastLine", "_code", "_code_first_line", "_code_last_line", and "_code_one_line".

flywire commented 2 years ago

The changes are shown below but it doesn't seem they can be changed from within the normal LibreOffice app style interface.

Sub HelloMacro
  Print "Hello"
End Sub

Is formatted as:

image

Extracting a content.xml snippet and reformatting for readability:

<text:p text:style-name="P2">
   <text:span text:style-name="T4">Sub</text:span>
   <text:span text:style-name="T7"> </text:span>
   <text:span text:style-name="T10">HelloMacro</text:span>
</text:p>
<text:p text:style-name="P2">
   <text:span text:style-name="T7">
      <text:s text:c="2"/>
   </text:span>
   <text:span text:style-name="T13">Print</text:span>
   <text:span text:style-name="T7"> </text:span>
   <text:span text:style-name="T16">&quot;Hello&quot;</text:span>
</text:p>
<text:p text:style-name="P2">
   <text:span text:style-name="T4">End</text:span>
   <text:span text:style-name="T7"> </text:span>
   <text:span text:style-name="T4">Sub</text:span>
</text:p>

T4 - green T7 - invisible?? T10 - blue T13 - black T16 - red

jmzambon commented 2 years ago

Hi,

Indeed colors are hard-formatted. Code Highlighter (original version like this fork) uses Pygments lexers to parse the code snippet. Each token is given a color separately, based on the choosen style. Styles are those available from Pygments. IMHO, creating a character style in LibreOffice for each reserved word of each in-document-used code language would be a overkill and would cancel the benefits of relying on Pygments. If you find a style is missing, the best option would be to create a new one, as documented on Pygments website. This way every user will benefit of it.

flywire commented 2 years ago

From https://pygments.org/docs/quickstart/#architecture

There are four types of components that work together highlighting a piece of code:

  • A lexer splits the source into tokens, fragments of the source that have a token type that determines what the text represents semantically (e.g., keyword, string, or comment). There is a lexer for every language or markup format that Pygments supports.
  • The token stream can be piped through filters, which usually modify the token types or text fragments, e.g. uppercasing all keywords.
  • A formatter then takes the token stream and writes it to an output file, in a format such as HTML, LaTeX or RTF.
  • While writing the output, a style determines how to highlight all the different token types. It maps them to attributes like “red and bold”.

Output [html??] is embedded in the LibreOffice document as office:automatic-styles.