moodymudskipper / unglue

Extract matched substrings using a pattern, similar to what package glue does in reverse
GNU General Public License v3.0
159 stars 2 forks source link

How can unglue deal with repeated patterns ? #29

Open moodymudskipper opened 1 year ago

moodymudskipper commented 1 year ago

What to do with :

Valve AA has flow rate=0; tunnels lead to valves DD, II, BB Valve BB has flow rate=13; tunnels lead to valves CC, AA

we can of course match the repeated part separately then use other tools, can we do better ? What would be the syntax ? Note that in this case as in most (I think) we have separators (on less by item by definition)

Maybe something like : "Valve {valves} has flow rate={rate}; tunnels lead to valves {to+=pattern}"

The + here means we match the same strings as "Valve {valves} has flow rate={rate}; tunnels lead to valves {to=pattern+}" but the output would be treated differently. we can generalise it to * and {n}, and ?.

Where we'd need a lookahead in the pattern if we're to account for separators, but maybe that's ok if we have a good example. Results would be nested and converted if relevant.

Then there is the technical question, the above might change the pattern to "(pattern)(pattern)?(pattern)?..." with a default tweak able max length, ugly but might work ?

Other option is to have an unglue_repeated() family of functions that we might use on the output of a regular output, these would have an optional sep arg.