mpalmer / lvmsync

Synchronise LVM LVs across a network by sending only snapshotted changes
http://theshed.hezmatt.org/lvmsync
GNU General Public License v3.0
380 stars 60 forks source link

Parse the output of thin_dump line by line #19

Closed AnchorCat closed 1 year ago

AnchorCat commented 10 years ago

For large volumes, the output of thin_dump can be tens of megabytes in size, resulting in enormous memory usage when attempting to parse the entire document as XML at once. This solution is less elegant, but it gets the job done much more efficiently by regex matching one line at a time.

This pull request contains a reimplementation of the code changed in [0], so I will be closing that pull request shortly.

[0] https://github.com/mpalmer/lvmsync/pull/18

mpalmer commented 9 years ago

While I didn't consider the non-trivial thin_dump case, and rexml/document isn't the right solution here, I'm loathe to throw regexes at this problem. I think that REXML::StreamListener should work reasonably for this. I'd definitely accept a modified patch using that class, or the REXML SAX2 API if you're feeling masochistic.