kschiess / parslet

A small PEG based parser library. See the Hacking page in the Wiki as well.
kschiess.github.com/parslet
MIT License
809 stars 95 forks source link

Parsing binary data #153

Closed romanlehnert closed 7 years ago

romanlehnert commented 8 years ago

Hey there,

i'm trying to figure out how to parse string that may contain binary data, which is prefixed with the byte-length inside the string and may contain unescaped control characters.

Regularly, the elements in the string are separated by a :. And when the : appears inside a string, it is escaped by a ?.

Example:

elem?:ent_1:element_2:element_3

Should parse to

[ "elem?:ent_1", "element_2", "element_3"]

But such an element may also contain binary data. This binary data

Within a row, it may look like this:

elem?:ent_1:@18@my_binary:string!!:element_3

Should parse to

[ "elem?:ent_1", "my_binary:string!!, "element_3"]

What is the best way to handle this with parslet? I'm really thankful for any advise.

ghost commented 8 years ago
  1. Have you read Capturing input section in this article? Something like this should work:

    require 'parslet'
    include Parslet
    ele =
     str('@') >>
     match('\\d').repeat.capture(:size) >>
     str('@') >>
     dynamic { |s, c| any.repeat(c.captures[:size].to_i, c.captures[:size].to_i) }
    ele.parse('@4@abcd')
  2. If the string contains any binary data, try to use ASCII-8BIT encode, it may work.