Declarative XML parsing library inspired by Ruby's Sax Machine, backed by Floki
This is used by Feedraptor for parsing feeds (RSS, Atom, etc.)
The package can be installed by adding capuli
to your list of dependencies in mix.exs
:
def deps do
[
{:capuli, "~> 0.3.0"}
]
end
Add use Capuli
in any module and define properties to parse:
defmodule AtomEntry do
use Capuli
element :title
# The :as argument makes this available through entry.author instead of entry.name
element :name, as: :author
# Element name is case insensitive, so it's not necessary to add the element name as feedburner:origLink
element :"feedburner:origlink", as: :url
element :published
end
defmodule Atom do
use Capuli
# The :with argument means that you only match a link tag
# that has an attribute of type: "text/html"
element :link, value: :href, as: :url, with: [
type: "text/html"
]
# The :value argument means that instead of setting the value
# to the text between the tag, it sets it to the attribute value of :href
element :link, value: :href, as: :feed_url, with: [
type: "application/atom+xml"
]
elements :entry, as: :entries, module: AtomEntry
end
Then parse any XML with your module:
feed = Atom.parse(xml_text)
feed.title # Whatever the title of the blog is
feed.url # The main URL of the blog
feed.feed_url # The URL of the blog feed
List.first(feed.entries).title # Title of the first entry
List.first(feed.entries).author # The author of the first entry
List.first(feed.entries).url # Permalink on the blog for this entry
Multiple elements can be mapped to the same alias:
defmodule RSSEntry do
use Capuli
# ...
element :pubdate, as: :published
element :"dc:date", as: :published
element :"dcterms:created", as: :published
end
If more than one of these elements exists in the source, the value from the first one is used. The order of
the element
declarations in the code is unimportant. The order they are encountered while parsing the
document determines the value assigned to the alias.
If an element is defined in the source but is blank (e.g., <pubDate></pubDate>
), it is ignored, and non-empty one is picked.
:default
optionCapuli is under MIT license. Check the LICENSE
file for more details.