Parsing of XML - Githubissues

Parsing html and xml is no joke, and I've been told regexes do not hold up for this task. Obviously ours do not succeed at separating content and urls. Either we need to get a lib to do this for us easily, or implement one ourselves(not likely). Libxml++, the c++ wrapper for libxml seems like a likely candidate, but the documentation is not good.