soulcutter / saxerator

A SAX-based XML parser for parsing large files into manageable chunks
MIT License
128 stars 19 forks source link

Ignore namespace config #9

Closed quoideneuf closed 10 years ago

quoideneuf commented 10 years ago

This is probably more of a RFC than a pull request.

I'm looking for a way to ignore namespaces while parsing the document. Please let me know if you think this behavior would be a good addition and, if so, if there are changes you'd like to see to this implementation.

soulcutter commented 10 years ago

Is this already covered by the strip_namespaces! configuration?

See https://github.com/soulcutter/saxerator/pull/8 and https://github.com/soulcutter/saxerator/blob/d7d250635aaa7bd8194a817fbe1b3cf4fe7db44f/spec/lib/saxerator_spec.rb#L69-L75

quoideneuf commented 10 years ago

I believe that strip_namespaces only affects the document fragments yielded by the enumerators. (Unless I've completely missed something). What I am looking for is a way to leave namespaces off of the arguments to for_tag. For example, if my document is in the 'marc' namespace:

I want to write: parser.for_tag('record') instead of: parser.for_tag('marc:record')

I want this for two reasons: 1) for convenience / flexibility and 2), more importantly, because in Jruby nokogiri will blow up if namespace prefixes are passed in to for_tag.

soulcutter commented 10 years ago

Thanks for bringing those problems up! In the short term I'm going to merge this and release a new version, but I'd definitely like for that sort of for_tag to work also.