rust-bakery / nom

Rust parser combinator framework
MIT License
9.38k stars 806 forks source link

Help needed with writing ambiguous parser #1666

Open gyzerok opened 1 year ago

gyzerok commented 1 year ago

Hello everyone!

Unfortunately I didn't find help elsewhere. Hopefully it's ok to ask in the issues. My apologies if not, I will close it.

It feels like there is something in nom that I can't grasp when it comes to writing ambiguous parsers. I am having similar problems in different places, but to make it simpler let's look at one particular example. However if you can give me general guidance on how to approach such problems it'll be greatly appreciated.

In the code example below I am trying to parse URL-like string such as reddit.com or api.reddit.com. However this code won't pass the following test:

assert_eq!(regname("reddit.com."), Ok((".", "reddit.com")))

As I understand it, since domain function expects things to be terminated with ., my input gets recognized as 2 domains without tld (instead of domain + tld) and thus gets ignored.

How can I make it work properly?

fn regname(i: &str) -> IResult<&str, &str> {
    context("regname", recognize(pair(many1(domain), tld)))(i)
}

fn tld(i: &str) -> IResult<&str, &str> {
    context(
        "tld",
        verify(
            recognize(many1(
                alpha1,
            )),
            // This predicate does contain "com", other tests
            // without dot in the end pass do pass
            is_known_tld,
        ),
    )(i)
}

fn domain(i: &str) -> IResult<&str, &str> {
    context(
        "domain",
        recognize(terminated(
            many1(alt((
                alphanumeric1,
                tag("-"),
            ))),
            tag("."),
        )),
    )(i)
}

Thank you!