minkphp / Mink

PHP web browser emulator abstraction
https://mink.behat.org/
MIT License
1.6k stars 280 forks source link

getText() and whitespace #583

Open grom358 opened 10 years ago

grom358 commented 10 years ago

I have noticed the drivers (checked selenium2 and goutte drivers) the getText() method does not preserve whitespace. Which means unable to properly test output to pre elements for example.

$text = $session->getPage()->find('css', 'pre')->getText();
$this->assertEquals("Hello\nWorld!", $text);
aik099 commented 10 years ago

Please list driver & mink version you're using. Try switching to dev-master version and check if problem persists there as well.

Also please post link to getText method implementation in mentioned drivers (better yet in all drivers).

aik099 commented 10 years ago

Ha, in MinkSelenium2Driver we indeed have such code, that strips new lines: https://github.com/minkphp/MinkSelenium2Driver/blob/master/src/Selenium2Driver.php#L513

I think it was done back then to allow comparison of text in more relaxed form from the Behat steps (or WebAssert class), that are used in MinkExtension.

@stof it might be as well misplaced code. If it happens in all drivers, then I think it's better to keep line endings as-is in drivers and remove them in text, that is being compared in the WebAssert class instead.

aik099 commented 10 years ago

For other drivers:

In all drivers, but Sahi we're stripping new lines, so it maybe Sahi strips them and other drivers strip manually to make it consistent. If Sahi doesn't strip them, then I don't know why we replace new lines with spaces.

stof commented 10 years ago

@aik099 SahiClient is not stripping, but I think Sahi itself normalizes the whitespaces before returning the text (meaning it probably behaves well by not normalizing whitespace in a <pre> tag btw).

and we replace them because in HTML, whitespaces are rendered as a single space in the output (except in <pre> tags), so the text the user sees does not have newlines or multiple spaces in it

aik099 commented 10 years ago

Then we shouldn't normalize whitespaces in PRE tags, however that might be a complicated task because I can:

  1. get PRE node and read text of it (easy, just don't touch whitespaces)
  2. get node, that has PRE node inside (harder, since we need to normalize whitespaces in all nodes but PRE
stof commented 10 years ago

@aik099 and even worse for the thrid case: it can read the text in a node which is inside a <pre> tag

aik099 commented 10 years ago

It maybe a bit hacky solution, but the $keepWhitespace parameter to getText method might be a quick and dirty solution. Writing some complicated logic to preserve whitespaces might not be worth it considering the % of cases, were we work with PRE in contrast to other tags.

aik099 commented 10 years ago

Actually 1st and 3rd case are easily detectable with an xpath search for pre tag in this or any of parent nodes.

2nd case is more tricky, because it requires to manually parse returned text to determine PRE location. But the getText method result doesn't even have HTML tags in it. It's the getHTML method that has them.

grom358 commented 10 years ago

As per http://www.w3.org/TR/html5/grouping-content.html#the-pre-element

Note: In the HTML syntax, a leading newline character immediately following the pre element start tag is stripped.

grom358 commented 10 years ago

Personally I would be happy with a $keepWhitespace parameter that defaulted to false as to not break existing code. At least would be able to get text in its pre formatted format then.

aik099 commented 10 years ago

Personally I would be happy with a $keepWhitespace parameter that defaulted to false as to not break existing code. At least would be able to get text in its pre formatted format then.

Sadly, that it would only cover simple cases, where PRE element isn't buried somewhere deep in returned text.

aik099 commented 9 years ago

So we're not handling PRE correctly, but otherwise all is fine.