merongivian / capuli

Declarative XML parsing library backed by Floki
MIT License
5 stars 2 forks source link

Capuli

Build Status Deps Status

Declarative XML parsing library inspired by Ruby's Sax Machine, backed by Floki

This is used by Feedraptor for parsing feeds (RSS, Atom, etc.)

Installation

The package can be installed by adding capuli to your list of dependencies in mix.exs:

def deps do
  [
    {:capuli, "~> 0.3.0"}
  ]
end

Examples

Add use Capuli in any module and define properties to parse:

defmodule AtomEntry do
  use Capuli
  element :title
  # The :as argument makes this available through entry.author instead of entry.name
  element :name, as: :author
  # Element name is case insensitive, so it's not necessary to add the element name as feedburner:origLink
  element :"feedburner:origlink", as: :url
  element :published
end

defmodule Atom do
  use Capuli
  # The :with argument means that you only match a link tag
  # that has an attribute of type: "text/html"
  element :link, value: :href, as: :url, with: [
    type: "text/html"
  ]
  # The :value argument means that instead of setting the value
  # to the text between the tag, it sets it to the attribute value of :href
  element :link, value: :href, as: :feed_url, with: [
    type: "application/atom+xml"
  ]
  elements :entry, as: :entries, module: AtomEntry
end

Then parse any XML with your module:

feed = Atom.parse(xml_text)

feed.title # Whatever the title of the blog is
feed.url # The main URL of the blog
feed.feed_url # The URL of the blog feed

List.first(feed.entries).title # Title of the first entry
List.first(feed.entries).author # The author of the first entry
List.first(feed.entries).url # Permalink on the blog for this entry

Multiple elements can be mapped to the same alias:

defmodule RSSEntry do
  use Capuli
  # ...
  element :pubdate, as: :published
  element :"dc:date", as: :published
  element :"dcterms:created", as: :published
end

If more than one of these elements exists in the source, the value from the first one is used. The order of the element declarations in the code is unimportant. The order they are encountered while parsing the document determines the value assigned to the alias.

If an element is defined in the source but is blank (e.g., <pubDate></pubDate>), it is ignored, and non-empty one is picked.

Tasks

License

Capuli is under MIT license. Check the LICENSE file for more details.