soulcutter / saxerator

A SAX-based XML parser for parsing large files into manageable chunks
MIT License
128 stars 19 forks source link

Bug with put_attributes_in_hash and repeated elements #24

Closed doomspork closed 9 years ago

doomspork commented 9 years ago

Hey @soulcutter, I think I've stumbled onto a bug with put_attributes_in_hash!, check out the following XML:

<Listing>
    <Amenity AmenityID="49" AmenityName="Elevator in Building" />
    <Amenity AmenityID="42" AmenityName="Exercise Facility" />
    <Amenity AmenityID="39" AmenityName="Other">1 Sauna</Amenity>
    <Amenity AmenityID="38" AmenityName="Parking" />
    <Amenity AmenityID="41" AmenityName="Pool">2 Pools</Amenity>
    <Amenity AmenityID="37" AmenityName="Unfurnished" />
</Listing>

Which when run through Saxerator outputs this Hash:

 :Amenity=>
  [{:AmenityID=>"49", :AmenityName=>"Elevator in Building"},
   {:AmenityID=>"42", :AmenityName=>"Exercise Facility"},
   "1 Sauna",
   {:AmenityID=>"38", :AmenityName=>"Parking"},
   "2 Pools",
   {:AmenityID=>"37", :AmenityName=>"Unfurnished"}],

The attributes are missing from the elements that contain both attributes and a value, is this is intended behavior?

Thanks!

soulcutter commented 9 years ago

I believe you can access the attributed by calling .attributes on the strings.

Honestly there is some ambiguity there, what would you expect to get?

soulcutter commented 9 years ago

Also it is in my mental backlog to stop sub classing String and Hash. I do think there are improvements to be made in the artifacts generated by parsing

doomspork commented 9 years ago

@soulcutter you're right, I was able to get what I needed through .attributes. To be honest I'm not entirely sure what I expect to be returned. I can say for certain that the values should be consistent, the mix of Hash and String makes for some less-than-ideal code.

In my particular situation this issue comes up three separate times in the XML structure so I ended up reducing each element down to an array of essentially [*attributes, value]. Another alternative, which I suspect would break backwards compatibility, would be to return something like:

{ :AmenityID=>"42", :AmenityName=>"Exercise Facility", :value => "1 Sauna" }

Since I've found a suitable workaround for the time being feel free to close this issue.

Thanks!