hplgit / doconce

Lightweight markup language - document once, include anywhere
http://hplgit.github.io/doconce/doc/web/index.html
Other
310 stars 60 forks source link

newcommands parsing is broken #137

Open utsekaj42 opened 6 years ago

utsekaj42 commented 6 years ago

newcommands parsing is broken as any newcommand replacement that is applied to text containing brackets with stop at the first closing bracket, e.g.

\newcommand{\normg}[1]{\lvert\lvert {#1} \rvert\rvert}
\normg{x = \sum_{i}^{P} f(d_i)}

will result in

\normg{\lvert\lvert x = {\sum_{i} \rvert\rvert f(d_i)}^{P}

I believe is due to the limitations of the regex, but I'm not very competent with regex. I believe what is required is to some how match the brackets, further for nested newcommands it is unclear how many levels should be evaluated. In html, latex, and ipynb (as of hplgit/doconce/pull/136 it is fine to leave newcommands, thus it is question of how to handle this for other formats.

I had started on a quick fix for non-nested newcommands with only 1 argument, which is

def recursive_bracket_parser(s, i):
    """ Inspired by <https://stackoverflow.com/a/14952529/4000607>"""
    while i < len(s):
        if s[i] == r'{' and (i<1 or s[i-1] != r'\\'):
            i = recursive_bracket_parser(s, i+1)
        elif s[i] == r'}' and (i<1 or s[i-1] != r'\\'):
            return i+1
        else:
            # process whatever is at s[i]
            i += 1
    return i

and an example mirror substitute in expand_newcommands.py

newcommands_test= [(r'\\normg', r'\\lvert\\lvert {NEWCOMMANDARG} \\rvert\\rvert}',1),
(r'\\normf', r'\\normg{NEWCOMMANDARG}_{NEWCOMMANDARG}', 2)] 

for pattern, replacement, nargs in newcommands_test:
    # 0 check if replacement at begining of string
    m = re.search(pattern, text)
    if m and m.start==0:
        first_match=0
    else:
        first_match=1
    # 1 Find all matches
    matches = re.split(pattern, text)
    # 2 process each match
    for match in matches[first_match:]: 
        #print(match, len(match))
        args = []
        for idx in range(nargs):
            end_arg = recursive_bracket_parser(match,1)
            args.append(match[0:end_arg])
            match = match[end_arg:]

        tmp = replacement
        for idx, arg in enumerate(args):
            print(tmp) 
            print(arg)
            tmp, n = re.subn(r'{NEWCOMMANDARG}', arg, tmp, count=1)
        print(tmp)
        #tmp =  replacement.format(*args) + match
        #@print(tmp)

I can continue with this line of work, but I think just including the newcommands in ipython, latex, and html is fine for my uses of doconce. Further I'm not sure if this is the right path to go down, but perhaps it could help with issues others have.

KGHustad commented 6 years ago

Regular expressions are only able to recognise regular languages. This is a big limitation when working with any language which may contain recursive structures.

We will implement proper parsing in the next major version, so I'm hesitant to spend time fixing this now.

It's probably best for you to continue with just including the newcommands for now.