Closed flywire closed 2 years ago
I was thinking this would be easy to implement, but I'm facing some unexpected problems. I will turn back soon on that issue.
Can you expose your code for this?
I pushed a dedicated branch, with a first attempt.
1. Pygments could be rerun over a mixed coding language document without user selecting lexer
This is done by selecting "automatic" as language.
3. select whole snippet with a click
Don't see a easy way to do this... The best option will remain to put snippet in a text table.
4. current interface would allow user to assign new lexer tag
Could you elaborate? Isn't this already the case?
5. Pygments could be rerun over document with mixed coding languages without selecting lexer
So far the last choosen language must be "automatic".
Can you briefly explain your approach? I'll look at code and xml but I might not pick up subtleties.
Use case - Rerun: snippet code is changed and needs to be highlighted.
skip
lexer tag will be neededAutomatic language lacks integrity and is fairly random. It misses too much (about half), almost all lexers were wrong, and the same code in different files gave different results: HelloWorld.zip HelloWorld.txt
I would be more interested in tagging snippets with language as the document was developed then select the whole document and update the style based on the snippet language tags. Say, forget about language automatic and use <tagged>
.
Sorry, I've edited your message instead of replying. And I see no way to revert to your original post...
- Despite CH2 being run with Use character styles in Writer, tags are in
<office:automatic-styles>
- I'd expect an
<office:body>
lexer tag at the start of each code block (currently coded as paragraphs, ie lines of code)
This is how opendocument standard works. See the specifications if you need more infos.
LibreOffice is responsible for outputting the document in the strict respect of that standard. Code Highlighter 2 is intended to be a LibreOffice extension, not to create document from scratch.
- a method is needed to verify the lexer for each code block from within the LO document
Please give concrete hints on how you think the user will use it and where he'd find it.
- sequential paragraphs of the same style are the same code block
Of course. Please try to be less cryptic, I'm tired of guessing what you mean...
Can you briefly explain your approach? I'll look at code and xml but I might not pick up subtleties.
No subtelties here: if "automatic" is choosen as lexer, CH2 will first search for a lexer name associated with the snippet and apply it, otherwise it will ask Pygments to guess one.
In other words, the first time you highlight a snippet, CH2 will store the lexer name with it (this is actually not a tag, but we can think it is). The next time, CH2 will apply the current lexer or, if this is "automatic", will apply the saved one. This way, you may rerun CH2 on mixed language snippets without worrying about erroneous guessing...
Use case - Rerun: snippet code is changed and needs to be highlighted.
1. rerun over a mixed coding language document - by selecting "automatic" - _want it to use any existing snippet language tag **not guess again**_
So this is done.
2. ... 3. select whole snippet with a click - no easy way - agree, it's a hotkey or context-sensitive menu select, app already contains selection code
Need more time to think about it. But why not using text frame for your snippets, this is the best approach?
4. Yes, already the case, if user selects a language the lexer will use it and overwrite snippet language tag, maybe a `skip` lexer tag will be needed
This is already done by choosing "automatic" language.
6. user probably needs to be able to restrict language (lexers) available
Why?
Automatic language lacks integrity and is fairly random. It misses too much (about half), almost all lexers were wrong, and the same code in different files gave different results: HelloWorld.zip HelloWorld.txt
Please see my previous comment.
I would be more interested in tagging snippets with language as the document was developed then select the whole document and update the style based on the snippet language tags.
This would be a totally new feature. I can't multiply shortcuts and buttons. But maybe I can think about a small api that you'll be able to use as you want. Please add a new issue for a feature "select whole document and update all previously highlighted snippets."
- Despite CH2 being run with Use character styles in Writer, tags are in
<office:automatic-styles>
- I'd expect an
<office:body>
lexer tag at the start of each code block (currently coded as paragraphs, ie lines of code)
This is how opendocument standard works. See the specifications if you need more infos.
LibreOffice is responsible for outputting the document in the strict respect of that standard. Code Highlighter 2 is intended to be a LibreOffice extension, not to create document from scratch.
- a method is needed to verify the lexer for each code block from within the LO document
Please give concrete hints on how you think the user will use it and where he'd find it.
- sequential paragraphs of the same style are the same code block
Of course. Please try to be less cryptic, I'm tired of guessing what you mean...
The instructions weren't clear at https://github.com/jmzambon/libreoffice-code-highlighter/issues/7#issuecomment-1189273094, which I suppose is reasonable to test how intuitive an interface is.
The next time, CH2 will apply the current lexer
Got it. Tested automatic language with python, LibreOffice Basic, and java highlighted snippets - fails and it is not clear why:
Edit:
a = 5
before running again and it highlighted the whole block correctlyTest code:
Start>
def open_greeting(args=None):
# Code lines with a maximum length of 80 characters will not wrap over lines
print("Hello World" + 1 * "!")
<End
BASIC
PRINT "Hello, World!"
Java
public class Main {
public static void main(String[] args) {
System.out.println("Hello, World!");
}
}
I'm wondering if a better approach would be to take the start and end of the snippet selected by the user, adjust it for start/end of paragraph and leading/trailing blank lines, and write a tag with the lexer. That would provide code blocks.
It seems the problem was the java code block was formatted with the wrong lexer (BBC Basic??). If so, why wouldn't it run again?
I try your test code with no problem. What I did:
I'm wondering if a better approach would be to take the start and end of the snippet selected by the user, adjust it for start/end of paragraph and leading/trailing blank lines, and write a tag with the lexer. That would provide code blocks.
This is exactly what is already implemented.
There is still cross communication, let's align it here.
take the start and end of the snippet selected by the user
The snippet selected by a user can be many lines, a code block, which becomes a paragraph for each line in Writer.
This is exactly what is already implemented.
No, the xml shows the snippet (code block) can be comprised of many paragraphs, each wrapped in a paragraph style (ie lexer style). Previously you suggested the code blocks might be able to be placed in a frame, presumably to associate with the lexer instead of each paragraph.
def open_greeting(args=None):
# Code lines with a maximum length of 80 characters will not wrap over lines
print("Hello World" + 1 * "!")
Occurs as three P1 paragraphs having <style:paragraph-properties ch2_lexer="Python"/>
<office:automatic-styles>
<style:style style:name="P1" style:family="paragraph" style:parent-style-name="Standard">
<style:paragraph-properties ch2_lexer="Python"/>
<style:text-properties fo:language="zxx" fo:country="none"/>
</style:style>
<style:style style:name="P2" style:family="paragraph" style:parent-style-name="Standard">
<style:paragraph-properties ch2_lexer="BBC Basic"/>
<style:text-properties fo:language="zxx" fo:country="none"/>
</style:style>
<style:style style:name="P3" style:family="paragraph" style:parent-style-name="Standard">
<style:paragraph-properties ch2_lexer="VB.net"/>
<style:text-properties fo:font-size="12pt" fo:language="zxx" fo:country="none"/>
</style:style>
</office:automatic-styles>
<office:body>
<office:text>
<text:sequence-decls>
<text:sequence-decl text:display-outline-level="0" text:name="Illustration"/>
<text:sequence-decl text:display-outline-level="0" text:name="Table"/>
<text:sequence-decl text:display-outline-level="0" text:name="Text"/>
<text:sequence-decl text:display-outline-level="0" text:name="Drawing"/>
<text:sequence-decl text:display-outline-level="0" text:name="Figure"/>
</text:sequence-decls>
<text:p text:style-name="Standard">Start></text:p>
<text:p text:style-name="P1">
<text:span text:style-name="Code.Keyword">def</text:span>
<text:span text:style-name="Code.Text"></text:span>
<text:span text:style-name="Code.Name.Function">open_greeting</text:span>
<text:span text:style-name="Code.Punctuation">(</text:span>
<text:span text:style-name="Code.Name">args</text:span>
<text:span text:style-name="Code.Operator">=</text:span>
<text:span text:style-name="Code.Keyword.Constant">None</text:span>
<text:span text:style-name="Code.Punctuation">):</text:span>
</text:p>
<text:p text:style-name="P1">
<text:span text:style-name="Code.Text">
<text:s text:c="4"/>
</text:span>
<text:span text:style-name="Code.Comment.Single"># Code lines with a maximum length of 80 characters will not wrap over lines</text:span>
</text:p>
<text:p text:style-name="P1">
<text:span text:style-name="Code.Text">
<text:s text:c="4"/>
</text:span>
<text:span text:style-name="Code.Name.Builtin">print</text:span>
<text:span text:style-name="Code.Punctuation">(</text:span>
<text:span text:style-name="Code.Literal.String.Double">"Hello World"</text:span>
<text:span text:style-name="Code.Text"></text:span>
<text:span text:style-name="Code.Operator">+</text:span>
<text:span text:style-name="Code.Text"></text:span>
<text:span text:style-name="Code.Literal.Number.Integer">1</text:span>
<text:span text:style-name="Code.Text"></text:span>
<text:span text:style-name="Code.Operator">*</text:span>
<text:span text:style-name="Code.Text"></text:span>
<text:span text:style-name="Code.Literal.String.Double">"!"</text:span>
<text:span text:style-name="Code.Punctuation">)</text:span>
</text:p>
<text:p text:style-name="Standard"><End</text:p>
<text:p text:style-name="Standard"/>
<text:p text:style-name="Standard">BASIC</text:p>
<text:p text:style-name="Standard"/>
<text:p text:style-name="P3">
<text:span text:style-name="Code.Name">PRINT</text:span>
<text:span text:style-name="Code.Text.Whitespace"></text:span>
<text:span text:style-name="Code.Literal.String">"Hello, World!"</text:span>
</text:p>
<text:p text:style-name="Standard"/>
<text:p text:style-name="Standard"/>
<text:p text:style-name="Standard">Java</text:p>
<text:p text:style-name="Standard"/>
<text:p text:style-name="P2">
<text:span text:style-name="Code.Name.Variable">public</text:span>
<text:span text:style-name="Code.Text.Whitespace"></text:span>
<text:span text:style-name="Code.Name.Variable">class</text:span>
<text:span text:style-name="Code.Text.Whitespace"></text:span>
<text:span text:style-name="Code.Name.Variable">Main</text:span>
<text:span text:style-name="Code.Text.Whitespace"></text:span>
<text:span text:style-name="Code.Error">{</text:span>
</text:p>
<text:p text:style-name="P2">
<text:span text:style-name="Code.Text.Whitespace">
<text:s text:c="2"/>
</text:span>
<text:span text:style-name="Code.Name.Variable">public</text:span>
<text:span text:style-name="Code.Text.Whitespace"></text:span>
<text:span text:style-name="Code.Name.Variable">static</text:span>
<text:span text:style-name="Code.Text.Whitespace"></text:span>
<text:span text:style-name="Code.Name.Variable">void</text:span>
<text:span text:style-name="Code.Text.Whitespace"></text:span>
<text:span text:style-name="Code.Name.Variable">main</text:span>
<text:span text:style-name="Code.Operator">(</text:span>
<text:span text:style-name="Code.Name.Variable">String</text:span>
<text:span text:style-name="Code.Error">[]</text:span>
<text:span text:style-name="Code.Text.Whitespace"></text:span>
<text:span text:style-name="Code.Name.Variable">args</text:span>
<text:span text:style-name="Code.Operator">)</text:span>
<text:span text:style-name="Code.Text.Whitespace"></text:span>
<text:span text:style-name="Code.Error">{</text:span>
</text:p>
<text:p text:style-name="P2">
<text:span text:style-name="Code.Text.Whitespace">
<text:s text:c="4"/>
</text:span>
<text:span text:style-name="Code.Name.Variable">System</text:span>
<text:span text:style-name="Code.Error">.</text:span>
<text:span text:style-name="Code.Name.Variable">out</text:span>
<text:span text:style-name="Code.Error">.</text:span>
<text:span text:style-name="Code.Name.Variable">println</text:span>
<text:span text:style-name="Code.Operator">(</text:span>
<text:span text:style-name="Code.Literal.String.Double">"Hello, World!"</text:span>
<text:span text:style-name="Code.Operator">);</text:span>
</text:p>
<text:p text:style-name="P2">
<text:span text:style-name="Code.Text.Whitespace">
<text:s text:c="2"/>
</text:span>
<text:span text:style-name="Code.Error">}</text:span>
</text:p>
<text:p text:style-name="P2">
<text:span text:style-name="Code.Error">}</text:span>
</text:p>
</office:text>
</office:body>
</office:document-content>
Lexer tag (actually a user defined paragraph property) is applied by the extension to the whole code-block, at once. LibreOffice translates this in the XML content in the manner you observed. There is nothing we can change about this, and it doesn't matter as, when the code-block is selected again, we can retrieve the tag transparently.
I don't think it's possible to add a object that could be translated as a XML tag spanning multiple paragraphs.
<draw:frame draw:style-name="fr1" draw:name="Frame3" text:anchor-type="paragraph" draw:z-index="2">
<draw:text-box fo:min-height="0.499cm" fo:min-width="17cm">
<text:p text:style-name="P1">
<text:span text:style-name="Code.Keyword">def</text:span>
<text:span text:style-name="Code.Text"></text:span>
<text:span text:style-name="Code.Name.Function">open_greeting</text:span>
<text:span text:style-name="Code.Punctuation">(</text:span>
<text:span text:style-name="Code.Name">args</text:span>
<text:span text:style-name="Code.Operator">=</text:span>
<text:span text:style-name="Code.Keyword.Constant">None</text:span>
<text:span text:style-name="Code.Punctuation">):</text:span>
</text:p>
<text:p text:style-name="P1">
<text:span text:style-name="Code.Text">
<text:s text:c="4"/>
</text:span>
<text:span text:style-name="Code.Comment.Single"># Code lines with a maximum length of 80 characters will not wrap over lines</text:span>
</text:p>
...
</draw:text-box>
</draw:frame>
Frame snippet test. Frame can be formatted to look identical to other paragraphs in Writer and pdf. Using XpdfReader to select/copy/paste pdf to np++ gives a different result with leading spaces in code between a frame and paragraphs.
This is a problem related to how text is internally stored in PDF file. Nothing to do with LibreOffice or Code Highlighter 2.
By the way, with Xreader and sublime-text, I see no difference: leading spaces are all removed.
The user selects the lexer [language] manually, or Pygments can guess, but the lexer is not saved. It would be useful to save the lexer with the snippet and allow Code Highlighter 2 to read a tag associated with a snippet (eg similar to markdown tagging code blocks with language) :