Closed gamboz closed 2 years ago
Your approach is correct! Your code also captures the macro \mu
which is not explicitly declared to the latexwalker. The default implementation relies on the behavior for default macros, which is to keep them as a macro node with no arguments. (The macro is separately declared for latex2text
as representing the unicode "μ" symbol.)
I realize it's a bit of a weakness of the API for now that the parse_args()
method is not given information about the macro/environment that is currently being parsed. This is usually not a problem in typical settings where you set a MacroSpec
or EnvironmentSpec
to specific macros, since in such cases the parser is usually tailored to a specific macro/environment. A possible approach to display the unknown macro name is to hook directly into the LatexContextDb
object. I also realize that these objects don't expose a simple way of doing this, but the following code achieves the desired behavior:
from pylatexenc import latexwalker, macrospec, latex2text
class UnknownMacroArgsParser(macrospec.MacroStandardArgsParser):
def __init__(self, macroname):
super().__init__()
self.macroname = macroname
def parse_args(self, w, pos, parsing_state=None):
print("Unknown macro `\\{}' at {}".format(self.macroname, pos))
return super().parse_args(w, pos, parsing_state=parsing_state)
class CustomLatexContextDb(macrospec.LatexContextDb):
def __init__(self, db):
super().__init__()
for cat in db.categories():
self.add_context_category(
cat,
macros=db.iter_macro_specs([cat]),
environments=db.iter_environment_specs([cat]),
specials=db.iter_specials_specs([cat]),
)
def get_macro_spec(self, macroname):
mspec = super().get_macro_spec(macroname)
if mspec is not None:
mspec
return macrospec.MacroSpec(macroname, args_parser=UnknownMacroArgsParser(macroname))
walker_context = CustomLatexContextDb(latexwalker.get_default_latex_context_db())
# second example
output = latex2text.LatexNodes2Text().latex_to_text(
r"""start
$\mu $
\foo
\foobar
""", latex_context=walker_context)
print(output)
# prints:
#
# Unknown macro `\mu' at 11
# Unknown macro `\foo' at 18
# Unknown macro `\foobar' at 26
# start
# μ
It's not a particularly elegant solution, and I'll look into how to make this easier in future versions of pylatexenc.
Regarding macros that are considered as unknown to latexwalker
but are known to latex2text
, you could consider emitting a warning only after performing a search in the latex2text
context db object (call l2tcontext.get_macro_spec(macroname)
and check if it is None
, where l2tcontext
is the context-db object used by latex2text
). I hope this helps.
I'm going to change the issue title to reflect that the desired improvement to pylatexenc is that unknown macro/environment/specials handlers be given more information about what macro/environment/specials was encountered.
Actually, I realize that issue #32 already asked a very similar question. If you care about converting to text, not necessarily about obtaining the argument structure, you can plug into latex2text
's context db to issue warnings for unknown macros. See my comment in issue #32.
Thank you for the clarifications. Yes, #32 is better for my use case (sorry I didn't spot it by myself).
I'm trying to have pylatexenc emit a warning when it finds an unknown macro.
I define an arguments parser that does nothing and emits a warning. Then I define a
MacroSpec
that uses this parser and finally I register it with the walker's context usingset_unknown_macro_spec()
. See the code below.I gently ask if this is the correct way to go. I suspect that I'm missing something, because I get a warning at the end of the math mode (at the second "$" in the second example in the code below).
I would also like to emit the name of the unknown macro, but this is for later :)