Open alexpott opened 4 years ago
DomCrawler is simply using php's DOMNode: https://www.php.net/manual/en/class.domnode.php#domnode.props.textcontent which is implementing the W3c spec: https://www.w3.org/TR/2003/WD-DOM-Level-3-Core-20030226/DOM3-Core.html#core-ID-1312295772
@jonathanjfshaw yep and it's returning what document.body.textContent
in the browser console does. The point is that this is not what \Behat\Mink\Driver\Selenium2Driver::getText()
returns and it is returning stuff that is not visible.
I see no issue here.
The Selenium driver is talking to a real browser and can ask to return only text visible to a user. The BrowserKit being a headless driver only looking at HTML tags and parsing them to its knowledge. This way stripping all HTML tags will leave their content in place resulting in the effect you're getting.
@alexpott , I'm recommending to use the getText
method on the BODY NodeElement (PHP class in Mink) of the document, not the whole document. This way you won't get any extra stuff (at least I hope so).
Code below (maybe not working) is how I'll be getting the contents of a document.
$body_text = $session->getPage()->find('xpath', '//body')->getText();
@aik099 body can contain script tags. Adding script tags just before closing the body tag is often advocated for performance reasons.
\Behat\Mink\Driver\BrowserKitDriver::getText() will return text in the head section and also any json on the page that's contained in a
script
tag in the HTML body. \Behat\Mink\Driver\Selenium2Driver::getText(), for example, will not return text from the head section or script tags in the body section. Given the Mink documentation states:I'm not sure if this is a Symfony\DomCrawler issue or not.
See for a discussion of the affects of this - https://www.drupal.org/project/drupal/issues/3175718