Closed pedrozudo closed 1 year ago
The problem came from the _
character in env name prolog_special
. Using prolog-special
works.
With input
\begin{prolog_special}
vacation:- time, money.
holidays:- time.
\end{prolog_special}
prolog_special
is recorded. Here _
has category code 8 (subscript). _
are set to catcode 12 (other).minted
(actually the dependent package fancyvrb
) scans the env content line by line, and checks for end of env by comparing the recorded envname with <envname>
if the just scanned line containing \end{<envname>}
. This is an \ifx
comparison, hence token by token and based on both character and category codes.\end{prolog_special}
is scanned, _
has catcode 12, different from the one in recorded envname. Therefore the scan never ends and an error is raised when it meets/exceeds the end of file.In conclusion, without special treatment, only those characters having the same catcode before and after \begin{minted}
can be used in <envname>
of \newminted[<language>]{<envname>}{<options>}
. For example, a-zA-Z
always have catcode 11 (letter) and -
always has catcode 12 (other).
Ah great, thanks a lot. It runs now without errors! I was looking at the completely wrong place to fix the bug.
I have a follow up question. You might be able to answer. Let's assume I want to compute the special words in a just in time fashion. That is, I want to have the lexer analyse the entire text and get from within the text the special words. I tried something like the following:
class SpecialPrologLexer(RegexLexer):
"""
Lexer for Prolog files.
"""
special_words = []
name = "Prolog"
aliases = ["prolog"]
filenames = ["*.ecl", "*.prolog", "*.pro", "*.pl"]
mimetypes = ["text/x-prolog"]
flags = re.UNICODE | re.MULTILINE
tokens = {
"root": [
(r"/\*", Comment.Multiline, "nested-comment"),
(r"%.*", Comment.Single),
# character literal
(r"0\'.", String.Char),
(r"0b[01]+", Number.Bin),
(r"0o[0-7]+", Number.Oct),
(r"0x[0-9a-fA-F]+", Number.Hex),
# literal with prepended base
(r"\d\d?\'[a-zA-Z0-9]+", Number.Integer),
(r"(\d+\.\d*|\d*\.\d+)([eE][+-]?[0-9]+)?", Number.Float),
(r"\d+", Number.Integer),
(r"[\[\](){}|.,;!]", Punctuation),
(r":-|-->", Punctuation),
(
r'"(?:\\x[0-9a-fA-F]+\\|\\u[0-9a-fA-F]{4}|\\U[0-9a-fA-F]{8}|'
r'\\[0-7]+\\|\\["\nabcefnrstv]|[^\\"])*"',
String.Double,
),
(r"'(?:''|[^'])*'", String.Atom), # quoted atom
# Needs to not be followed by an atom.
# (r'=(?=\s|[a-zA-Z\[])', Operator),
(r"is\b", Operator),
(r"(<|>|=<|>=|==|=:=|=|/|//|\*|\+|-)(?=\s|[a-zA-Z0-9\[])", Operator),
(r"(mod|div|not)\b", Operator),
(r"_", Keyword), # The don't-care variable
(r"([a-z]+)(:)", bygroups(Name.Namespace, Punctuation)),
(
r"([a-z\u00c0-\u1fff\u3040-\ud7ff\ue000-\uffef]"
r"[\w$\u00c0-\u1fff\u3040-\ud7ff\ue000-\uffef]*)"
r"(\s*)(:-|-->)",
# bygroups(Name.Function, Text, Operator),
special_function_callback,
), # function defn
(
r"([a-z\u00c0-\u1fff\u3040-\ud7ff\ue000-\uffef]"
r"[\w$\u00c0-\u1fff\u3040-\ud7ff\ue000-\uffef]*)"
r"(\s*)(\()",
# bygroups(Name.Function, Text, Punctuation),
special_function_callback,
),
(
r"[a-z\u00c0-\u1fff\u3040-\ud7ff\ue000-\uffef]"
r"[\w$\u00c0-\u1fff\u3040-\ud7ff\ue000-\uffef]*",
# String.Atom,
special_atom_callback,
), # atom, characters
# This one includes !
(
r"[#&*+\-./:<=>?@\\^~\u00a1-\u00bf\u2010-\u303f]+",
# String.Atom,
special_atom_callback,
), # atom, graphics
(r"[A-Z_]\w*", Name.Variable),
(r"\s+|[\u2000-\u200f\ufff0-\ufffe\uffef]", Text),
],
"nested-comment": [
(r"\*/", Comment.Multiline, "#pop"),
(r"/\*", Comment.Multiline, "#push"),
(r"[^*/]+", Comment.Multiline),
(r"[*/]", Comment.Multiline),
],
}
def analyse_text(text):
return ":-" in text
def get_special_words(self, text):
# do something smarter than the dummy code below
return ["vacation", holidays]
def get_tokens(self, text, unfiltered=False):
"""
Return an iterable of (tokentype, value) pairs generated from
`text`. If `unfiltered` is set to `True`, the filtering mechanism
is bypassed even if filters are defined.
Also preprocess the text, i.e. expand tabs and strip it if
wanted and applies registered filters.
"""
self.special_words = self.get_special_wrods(text)
if not isinstance(text, str):
if self.encoding == "guess":
text, _ = guess_decode(text)
elif self.encoding == "chardet":
try:
import chardet
except ImportError as e:
raise ImportError(
"To enable chardet encoding guessing, "
"please install the chardet library "
"from http://chardet.feedparser.org/"
) from e
# check for BOM first
decoded = None
for bom, encoding in _encoding_map:
if text.startswith(bom):
decoded = text[len(bom) :].decode(encoding, "replace")
break
# no BOM found, so use chardet
if decoded is None:
enc = chardet.detect(text[:1024]) # Guess using first 1KB
decoded = text.decode(enc.get("encoding") or "utf-8", "replace")
text = decoded
else:
text = text.decode(self.encoding)
if text.startswith("\ufeff"):
text = text[len("\ufeff") :]
else:
if text.startswith("\ufeff"):
text = text[len("\ufeff") :]
# text now *is* a unicode string
text = text.replace("\r\n", "\n")
text = text.replace("\r", "\n")
if self.stripall:
text = text.strip()
elif self.stripnl:
text = text.strip("\n")
if self.tabsize > 0:
text = text.expandtabs(self.tabsize)
if self.ensurenl and not text.endswith("\n"):
text += "\n"
def streamer():
for _, t, v in self.get_tokens_unprocessed(text):
yield t, v
stream = streamer()
if not unfiltered:
stream = apply_filters(stream, self.filters, self)
return stream
What is happening, is that now the special_words
are initially not given. Only after running get_tokens
the lexer know what the special words are. The weird thing now is that this works fine when using pygments only but does not work anymore using minted. By not work I mean that with minted the tokens in the special_words
list get not correctly highlighted.
My suspicion is that minted somehow/somewhere creates a new instantiation of the lexer that does not use the new get_tokens
functions but a different one. Any idea what might be going on?
Hi,
So, I was playing around with the prolog lexer of pygments and try to extend it with callbacks in order to highlight special tokens differently. The weird thing is that within a custom minted environment it does not work but when just using pygments without minted it does give me the expected output.
Here are the details:
I have the following lexer file, called:
prolog_special_lexer.py
(it is more or less copy-pasted from the pygment prolog lexer, only difference being that I have introduced callback to catch special token). The names of the callback functions arespecial_atom_callback
andspecial_function_callback
.So this is my lexer, all good so far. I can test this with the following python script.
This python script runs and outputs two files one producing the tex-code to render the prolbog-code and the other producing the tex-code for the style. I can include this in a
.tex
file and run it.Most of the stuff in the preamble is the style and the prolog code is in the
Verbatim
environment in the body of the.tex
file. Compiling the.tex
file works just fine and the prolog code is highlighted as expected (holidays
andvacation
are the two special words that are highlighted as builtins). The pic below is a partial screenshot of the entire pdf that is produced by the.tex
code.So far so good. Now I want to have this within minted. I basically declare a new minted environment using the prolog lexer with the callbacks and should be done. Well not really, it does not work. Here is my
.tex
file using minted:Compiling this
.tex
file now does not work. It gives me the following error:I guess this is the relevant part of the log:
So yeah, anyone got an idea what's going on here?
Ah, just running the prolog lexer (the original one) with minted works fine. So it has got something to do with the callbacks.