Closed AnchorCat closed 1 year ago
While I didn't consider the non-trivial thin_dump case, and rexml/document
isn't the right solution here, I'm loathe to throw regexes at this problem. I think that REXML::StreamListener
should work reasonably for this. I'd definitely accept a modified patch using that class, or the REXML SAX2 API if you're feeling masochistic.
For large volumes, the output of thin_dump can be tens of megabytes in size, resulting in enormous memory usage when attempting to parse the entire document as XML at once. This solution is less elegant, but it gets the job done much more efficiently by regex matching one line at a time.
This pull request contains a reimplementation of the code changed in [0], so I will be closing that pull request shortly.
[0] https://github.com/mpalmer/lvmsync/pull/18