qcam / saxy

Fast SAX parser and encoder for XML in Elixir
https://hexdocs.pm/saxy
MIT License
276 stars 39 forks source link

Have you considered implementing a declarative syntax for parsing? #99

Open thiagomajesk opened 2 years ago

thiagomajesk commented 2 years ago

Hi again! The last time we spoke I've been using Saxy to create an RSS feed parser and after successfully implementing the RSS 2.0 spec, I'm now trying to make the handler code a little bit more generic to accept other types of configuration and was wondering if you have ever considered implementing something similar to https://github.com/pauldix/sax-machine with Saxy (and if not, how hard would it be to get it done in your opinion).

I've been toying with the idea, but since I'm not used to parsing documents in SAX-style I'm having a bit of trouble figuring out a nice way of abstracting the handler logic into something more reusable that does not compromise that much performance. I figure this type of configuration could be passed to a generic handler and elements being parsed accordingly:

[
  [element: :title],
  [element: :cloud],
  [element: :image, from: image_mappings],  
  [element: "textInput", from: text_input_mappings],
  [elements: "skipHours", from: skip_hours_mappings],
  [elements: :item, as: :entry, from: entry_mappings]
]

Update¹: I'm seeing this section in the docs about the Saxy.Builder https://hexdocs.pm/saxy/Saxy.html#module-encoder, that presents a declarative API for encoding the data into XML, but it couldn't find if the other way around is also supported.

Update²: I'm also gonna link the repo I'm working on, perhaps you could give me some pointers on how better to optimize/ reuse the parsing code: https://github.com/thiagomajesk/gluttony.

qcam commented 2 years ago

I have to admit I have been thinking about this since the start but couldn't really find an appropriate API.

I have previously made a StackParser which would give more info about ancestor of the current node. Then we could write something like this:

# parent2 > parent1 > foo
def handle_event(:start_element,{"foo", _, _}, [{"parent1", _, _}, {"parent2", _, _}, _] = _current_stack, state) do
  ...
end 

Something I have been having in mind is that, an DSL like this could be use to generate such parser.

defmodule MyParser do
  dsl(
    foo: "parent1 -> parent2 -> foo.content",
    bar: "parent1 -> parent2 -> bar.attributes['value']
  )
end
thiagomajesk commented 2 years ago

Cool @qcam... IMHO the syntax of the sax-machine lib would be the ideal version of that. Have you thought about how hard would it be to implement something similar?

ducharmemp commented 1 year ago

👋 hello! I had a similar thought (seems like the use-case is always parsing RSS feeds of some kind) so thought that I'd put some work into a declarative parser. Hope that light advertisement isn't frowned upon here since I think it can be generally useful for those looking at this issue for a solution, as I did. https://hexdocs.pm/saxaboom/api-reference.html (github: https://github.com/ducharmemp/saxaboom) implements a declarative parsing syntax effectively identical to sax-machine from Ruby. From my testing it's pretty fast and Saxy really shines here. Docs are sparse at the moment but I hope to work on them in a few days.