Closed 4meck closed 2 years ago
This seems to be related to how DOMDocument is adding head
and body
tags to the HTML string being passed in. DOMDocument actually puts the first comment outside the html
tag, and HtmlConverter
gets mixed up.
Psy Shell v0.10.9 (PHP 8.0.12 — cli) by Justin Hileman
>>> $d = new DOMDocument()
>>> $d->loadHTML('<!-- opening --><p>hi</p><!-- closing -->')
=> true
>>> $d->saveHTML()
=> """
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "http://www.w3.org/TR/REC-html40/loose.dtd">\n
<!-- opening --><html><body><p>hi</p><!-- closing --></body></html>\n
"""
Including html
and body
tags in the string passed to loadHTML
works, though.
Is that a bug in DOMDocument, or has it always done this?
Hi @bigsweater,
It looks like DOMDocument
has always done that - see this example of your code running on multiple PHP versions: https://3v4l.org/7bC33
Nice, never used that service before. Thanks for looking.
Does it make sense, then, to have HtmlConverter
deal with the unexpected structure or is there somewhere else this can be fixed?
Happy to take a swing at it myself!
output
code malformed: