This is a fix for https://github.com/thephpleague/html-to-markdown/issues/212: if you pass HTML that begins with a comment (like <!-- uh oh --><p>hi</p>) to HtmlConverter->convert, the resulting markdown looks like this: <!-- uh oh --><html><body>hi\n.
This is because DOMDocument->loadHTML actually puts that first comment at the root of the document, outside the html and body tags. The sanitize method only removes html and body tags if they're at position 0 of the markdown string -- but with the comment at the root of the document, the position of the tags will always be > 0, so they never get removed (and that first comment is never removed, either).
So this adds a step to the createDOMDocument method: it finds any comments at the root of the DOMDocument and prepends them to the <body> tag.
This is a fix for https://github.com/thephpleague/html-to-markdown/issues/212: if you pass HTML that begins with a comment (like
<!-- uh oh --><p>hi</p>
) toHtmlConverter->convert
, the resulting markdown looks like this:<!-- uh oh --><html><body>hi\n
.This is because
DOMDocument->loadHTML
actually puts that first comment at the root of the document, outside thehtml
andbody
tags. Thesanitize
method only removeshtml
andbody
tags if they're at position 0 of the markdown string -- but with the comment at the root of the document, the position of the tags will always be > 0, so they never get removed (and that first comment is never removed, either).So this adds a step to the
createDOMDocument
method: it finds any comments at the root of theDOMDocument
and prepends them to the<body>
tag.Evidently DOMDocument has always behaved this way, so maybe this isn't the correct fix?