r1chardj0n3s / parse

Parse strings using a specification based on the Python format() syntax.
http://pypi.python.org/pypi/parse
MIT License
1.72k stars 101 forks source link

Where is located the split between two zones when there is more than one possibility? #161

Closed mcoulont closed 1 year ago

mcoulont commented 1 year ago

Hello.

In the following code the string AB=2, can be split 3 different ways:

regex1 = "[0-9A-Z]+"
regex2 = "[A-Z]+=[0-9]+"
regex = "(" + regex1 + ")|(" + regex2 + ")"

@with_pattern(regex)
def parse_AlphanumericMaybeInEquality(text):
    return text

print(parse(
            "{beginning:AlphanumericMaybeInEquality}{end}",
            "AB=2,",
            dict(AlphanumericMaybeInEquality=parse_AlphanumericMaybeInEquality)
        ))

If the shortest first term is the choice, beginning='A' and end='B=2,'

If the longest first term is the choice, beginning='AB=2' and end=',':

print(re.match('^' + regex2 , "AB=2,"))

returns

<re.Match object; span=(0, 4), match='AB=2'>

Yet we get neither:

<Result () {'beginning': 'AB', 'end': '=2,'}>

What's supposed to happen?

Thanks for your work

jenisys commented 1 year ago

Basically, you as provider of the pattern/type-definition/type-converter are:

Therefore, if you provide:

regex = "(" + regex2 + ")|(" + regex1 + ")"
# OR BETTER: regex = f"({regex2})|({regex1})"

you get the expected result:

<Result () {'beginning': 'AB=2', 'end': ','}>

NOTES:

OTHERWISE:

mcoulont commented 1 year ago

OK thanks I've learnt something today.

Sorry to have disturbed.