zevv / npeg

PEGs for Nim, another take
MIT License
330 stars 22 forks source link

A way to let every peg rule to produce its own object... #32

Closed khchen closed 3 years ago

khchen commented 3 years ago

Here are some codes to demonstrate my problem.

import npeg, strutils

const
  testData = "10:10;12:00;22:40" # last item has no ';'

proc test1() =
  var list: seq[string]
  let peg = peg "start":
    start <- +statement
    statement <- >time * ';':
      list.add $1
    time <- Digit[1..2] * ':' * Digit[1..2]

  discard peg.match(testData)
  echo list

Output: @["10:10", "12:00"]

The output is what we want. However, we like object instead of string. So...

type
  Time = object
    hour: int
    min: int

proc test2() =
  var list: seq[Time]
  let peg = peg "start":
    start <- +statement
    statement <- time * ';'
    time <- >Digit[1..2] * ':' * >Digit[1..2]:
      let (hour, min) = (parseInt($1), parseInt($2))
      if hour in 0..23 and min in 0..59:
        list.add Time(hour: hour, min: min)

  discard peg.match(testData)
  echo list

Output: @[(hour: 10, min: 10), (hour: 12, min: 0), (hour: 22, min: 40)] We get seq[Object] as output, however, code block capture are always executed even when the parser state is rolled back afterwards. The result is wrong.

proc test3() =
  var list: seq[Time]
  let peg = peg "start":
    start <- +statement
    statement <- time * ';':
      for i in countup(1, capture.len-1, step=2):
        let (hour, min) = (capture[i].s.parseInt, capture[i+1].s.parseInt)
        if hour in 0..23 and min in 0..59:
          list.add Time(hour: hour, min: min)

    time <- >Digit[1..2] * ':' * >Digit[1..2]

  discard peg.match(testData)
  echo list 

Output: @[(hour: 10, min: 10), (hour: 12, min: 0)]

Finally, we get what we want. However I think this code is bad because we produce Time object outside of time rule. If statment rule is statement <- (time | date | something) * ';' , the code will be really ugly.

The way I can resolve the problem for now is:

import marshal

proc test4() =
  let peg = peg "start":
    start <- +statement:
      var list: seq[Time]
      for i in 1..<capture.len:
        list.add to[Time](capture[i].s)
      push($$list)

    statement <- time * ';'

    time <- >Digit[1..2] * ':' * >Digit[1..2]:
      let (hour, min) = (parseInt($1), parseInt($2))
      if hour in 0..23 and min in 0..59:
        push($$Time(hour: hour, min: min))

  var list = to[seq[Time]](peg.match(testData).captures[0])
  echo list

Output: @[(hour: 10, min: 10), (hour: 12, min: 0)]

Ok, it works fine, and the code is clear, each rule produce the object of itself. But it will be very slow due to serialization/deserialization, and marshal cannot works at compile-time (https://github.com/treeform/jsony can).

In the end, is there a better/smarter way to do this? Sorry for my bad English.

zevv commented 3 years ago

I'm afraid I have no better or smarter way to do this, this is a limitation of the way the code block captures now work; there are some ideas floating around to change this behavior, but no concrete solutions yet. You can take a peek at the other open issues #14 and #24, as they basically revolve about the same issue as yours.

zevv commented 3 years ago

What about:

import npeg, strutils

type         
  Time = object
    hour: int
    min: int

const
  testData = "10:10;12:00;22:40" # last item has no ';'

proc test2() =          
  var list: seq[Time]  
  let peg = peg "start":     
    start <- +statement                       
    statement <- time * ';':                        
      let (hour, min) = (parseInt($1), parseInt($2))
      if hour in 0..23 and min in 0..59:
        list.add Time(hour: hour, min: min)

    time <- >Digit[1..2] * ':' * >Digit[1..2] 

  discard peg.match(testData)
  echo list

test2()
zevv commented 3 years ago

Oh, right, I now see this is your test3() case, I should first properly read and only then write.

khchen commented 3 years ago

I use your library to rewrite my autolayout parser. The problem was resolved by a simple serialization/deserialization mechanism. Thank you very much.