Output encoding - Githubissues

soulcutter / saxerator

A SAX-based XML parser for parsing large files into manageable chunks

MIT License

128 stars 19 forks source link

Ox can use document encoding: <?xml version="1.0" encoding="Windows-1251" ?>. I want convert this encoding like this string&.encode('UTF-8', invalid: :replace, undef: :replace, replace: '?') But my parsed item is Saxerator::Builder::HashElement I can't do this way item.transform_values! { |value| value&.encode('UTF-8', invalid: :replace, undef: :replace, replace: '?') } because item can contains elements of types: Saxerator::Builder::ArrayElement and Saxerator::Builder::HashElement recursively. Moreover I must convert encoding for attributes.

So there are 2 ways to resolve this problem. 1) Worst way: Add method deep_encode like https://apidock.com/rails/Hash/deep_merge 2) Best way: Add output_encoding to saxerator configuration and convert on parsing

P.S. And there is another question. Why we are using this types? Can we simplify like this: https://github.com/savonrb/gyoku

The primary reason for HashElement et al. was to capture name and attributes. When this was written it was useful to be able to treat values as though they were language primitives (Array, Hash, etc) most of the time and so I attempted to lightly decorate those core classes. If I were to write it again I think I would avoid the confusion/ambiguity of Hash-like and Array-like classes and instead rely on more traditional Node classes that could be converted via to_a and to_h to "pure" standard classes. I'm less-inclined to pass around primitives these days in comparison to writing my own classes in the way that I write code.

Gyoku's hash syntax is interesting for representing attributes, I could see that working. It would be a significant change to the generated structure, and a breaking change. This library is not very actively developed at the moment, if that's something you're interested in carrying forward then a fork would be the way to go.

I'm happy you found this library useful-enough to make constructive suggestions!

soulcutter / saxerator

Output encoding #70