soulcutter / saxerator

A SAX-based XML parser for parsing large files into manageable chunks
MIT License
128 stars 19 forks source link

it won't parse attributes when there's content inside tag #19

Closed deemytch closed 10 years ago

deemytch commented 10 years ago

Hi When parsing that piece

<offers>
<offer available="true" original_id="893169" type="vendor.model" id="7602443">
<delivery>true</delivery>
[...skip...]
<vendorCode>10131034  500</vendorCode>
<param name="Color">blue</param>
<param name="Collection">Winter 2014/2015</param>
<param name="Season">Winter</param>
<param name="Country">China</param>
<param name="Unit=INT">46/48</param>
<param name="Sex">Female</param>
<param name="Age">Adult</param>
<categoryId>1908</categoryId>
[...skip...]
</offer>

with that code:

parser = Saxerator.parser(xml) do |config|
  config.output_type = :hash
  config.symbolize_keys!
  config.put_attributes_in_hash!
end
items = parser.for_tag(:offer)

I get

puts items.first
{:delivery=>"true",
  [...skip...],
  :manufacturer_warranty=>"true", 
  :param=>["blue", "winter 2014/2015", "Winter", "China", "46/48", "Female", "Adult"], 
  :categoryId=>"1908",
  [...skip...]
}

instead of, for ex.

{ :param => [ { name:  "Color", "": "blue" }, 
  { name: "Collection", "": "Winter  2014/2015" },
  { name: "Season", "": "Winter"  },
  { name: "Sex", "": "Female" } ...], ...

etc. Also that piece <category id="1783" parentId="1781">Jeans</category> is parsed simply as "Jeans" ignoring :id and :parentId

I really don't know how exactly that situation must be processed, but sure there's information lost, and this is unwanted.

soulcutter commented 10 years ago

Do the param strings respond to attributes? It does sound like something's fishy with put_attributes_in_hash! in that scenario.

deemytch commented 10 years ago

Sorry, don't understand you. Could you explain?

soulcutter commented 10 years ago

items.first[:param].first.attributes Does this work and give you a hash containing { :name => 'Color' } ?

deemytch commented 10 years ago

items.first[:param].first.attributes => {"name"=>"Color"} yes, got it! Thank you.

deemytch commented 10 years ago

categories.first.attributes => {"id"=>"1784", "parentId"=>"1781"}

deemytch commented 10 years ago

I think that likely to be in the Readme, what you think?

soulcutter commented 10 years ago

Yeah, I'd like some better documentation