soulcutter / saxerator

A SAX-based XML parser for parsing large files into manageable chunks
MIT License
128 stars 19 forks source link

json output #29

Open MortalCatalyst opened 9 years ago

MortalCatalyst commented 9 years ago

Just wondering why saxerator doesn't feature a :json output option?

soulcutter commented 9 years ago

Because nobody has contributed that feature yet!

But also there are issues with translating XML to JSON, because JSON does not have a direct corollary to element attributes. Saxerator gets around this by using extended classes for String and Hash which have attributes collections, but it's less-clear to me a way to represent this in JSON.

I'm open to proposals!

Tails commented 6 years ago

similarly, I'm looking for a flat :string output so I can parse the result in my own classes!

v3rmin commented 6 years ago

Proposal: Use @ char for attributes

XML:

<products>
  <product>
    <name>iPhone</name>
    <price currency="USD">1337</price>
  </product>
</products>

OUTPUT:

parser.for_tag('product').each do |item|
  p item # => { 'name' => 'iPhone', 'price' => 1337, 'price@currency' => 'USD' }
end
fanantoxa commented 6 years ago

@v3rmin With this flow ti'll require to parse attribute names and it'd hard to understand that this it actually relate to the same node. What if we'll represent our custom strings and hashes like and object?

parser.for_tag('product').each do |item|
  p item # => {
     'name': 'iPhone',
     'price': {
       value: 1337,
       attributes: {
          'currency: 'USD'
       }
     }
   }
end

Also we're talking about JSON it should be a String, not a Hash. @soulcutter how you like this?

v3rmin commented 6 years ago

I think it must be a markup that doesn't collide with valid XML. What if the XML looked like this:

xml = <<-XML
<product>
  <name>iPhone</name>
  <price currency="USD">
    <value>1337</value>
    <attributes>
      <currency>EUR</currency>
    </attributes>
  </price>
</product>
XML

require 'saxerator'

parser = Saxerator.parser(xml)
parser.for_tag('product').each do |item|
  p item # => {"name"=>"iPhone", "price"=>{"value"=>"1337", "attributes"=>{"currency"=>"EUR"}}}
end