Open sw-dbrown opened 1 year ago
Digging a little bit into the source it appears that there is currently no special handling for the lstlisting
environment as there is for verbatim
. Should VerbatimArgsParser
be adapted to handle more environments in addition to verbatim
?
Edit: A quick change which causes lstlisting
to be handled in a similar way as the verbatim
environment:
From 94780a144e1f79ec17bfede62e0f3b10af3e65c9 Mon Sep 17 00:00:00 2001
From: David Brown <dbrown@schutzwerk.com>
Date: Fri, 14 Oct 2022 09:54:54 +0200
Subject: [PATCH] Handle lstlisting as verbatim environment
---
.../pylatexenc/latexwalker/_defaultspecs.py | 17 ++++++++++++++
.../pylatexenc/macrospec/_argparsers.py | 23 +++++++++++++++++++
2 files changed, 40 insertions(+)
diff --git a/venv/lib/python3.10/site-packages/pylatexenc/latexwalker/_defaultspecs.py b/venv/lib/python3.10/site-packages/pylatexenc/latexwalker/_defaultspecs.py
index 3306a89..a8f6e24 100644
--- a/venv/lib/python3.10/site-packages/pylatexenc/latexwalker/_defaultspecs.py
+++ b/venv/lib/python3.10/site-packages/pylatexenc/latexwalker/_defaultspecs.py
@@ -308,6 +308,23 @@ specs = [
# for verbatim |\like \this|...
]}),
+
+ #
+ # CATEGORY: listings
+ #
+ ('listings', {
+ 'macros': [
+ # TODO: Maybe handle lstinline as well?
+ ],
+ 'environments': [
+ EnvironmentSpec('lstlisting',
+ args_parser=VerbatimArgsParser(verbatim_arg_type='lstlisting-environment')),
+ ],
+ 'specials': [
+ # optionally users could include the specials "|" like in latex-doc
+ # for verbatim |\like \this|...
+ ]}),
+
#
# CATEGORY: theorems
#
diff --git a/venv/lib/python3.10/site-packages/pylatexenc/macrospec/_argparsers.py b/venv/lib/python3.10/site-packages/pylatexenc/macrospec/_argparsers.py
index 5d06724..39b311a 100644
--- a/venv/lib/python3.10/site-packages/pylatexenc/macrospec/_argparsers.py
+++ b/venv/lib/python3.10/site-packages/pylatexenc/macrospec/_argparsers.py
@@ -451,6 +451,29 @@ class VerbatimArgsParser(MacroStandardArgsParser):
)
return (argd, pos, len_)
+ if self.verbatim_arg_type == 'lstlisting-environment':
+ # simply scan the string until we find '\end{lstlisting}'. That's
+ # exactly how LaTeX processes it.
+ endverbpos = w.s.find(r'\end{lstlisting}', pos)
+ if endverbpos == -1:
+ raise latexwalker.LatexWalkerParseError(
+ s=w.s,
+ pos=pos,
+ msg=r"Cannot find matching \end{lstlisting}"
+ )
+ # do NOT include the "\end{lstlisting}", latexwalker will expect to
+ # see it:
+ len_ = endverbpos-pos
+
+ argd = ParsedVerbatimArgs(
+ verbatim_chars_node=w.make_node(latexwalker.LatexCharsNode,
+ parsing_state=parsing_state,
+ chars=w.s[pos:pos+len_],
+ pos=pos,
+ len=len_)
+ )
+ return (argd, pos, len_)
+
if self.verbatim_arg_type == 'verb-macro':
# read the next nonwhitespace char. This is the delimiter of the
# argument
--
2.37.2
I am not sure if this is the correct way to implement this. It does, however, now lead to the correct behavior and has the nice side effect that nodelist_to_latex()
returns the expected content :)
Note: This does not handle other macros or environments provided by the listings
package (e.g. lstinline
).
Thanks for the careful analysis! I'll have a closer look at some point soon.
An update here — the lstlisting
environment is now recognized correctly in pylatexenc
version 3.0alpha
. I am still likely to update the way the contents is reported though. I'm aiming to have verbatim environments report their contents with their environment body nodelist containing a single LatexCharsNode with the verbatim content (which might include characters that could have a special meaning in LaTeX). This would take advantage of the new parsing features of pylatexenc 3
.
Hello, I have though a little bit more about the issue I described in https://github.com/phfaist/pylatexenc/issues/89. While I definitely understand and respect your reasoning I have come to the conclusion that the issue is not related to
nodelist_to_latex()
at all.Consider the following example:
It is important to note the presence of
tolerant_parsing=False
. Executing this code leads to the following error message:If
tolerant_parsing
is set toTrue
, the script will successfully complete execution and return the following output:Here, it can be seen that a
LatexMathNode
is created inside of thelstlisting
environment. In my opinion, this is not correct behavior since content inside of thelstlisting
environment should be treated as verbatim (one bigLatexCharsNode
(?)). The construction of the unbalancedLatexMathNode
is what eventually causesnodelist_to_latex()
to produce invalid latex code but it is, in my opinion, not the root of the problem. Similarly, in the second example in https://github.com/phfaist/pylatexenc/issues/89 the sequence\"
gets turned into aLatexMacroNode
.Substituting the
lstlisting
environment for a verbatim environment instead leads to correct results.Again, thank you very much for your time!