Closed Saeven closed 6 years ago
I can confirm this problem!
And also the existence in zend-view:
echo $this->htmlList($items, null, ['id' => "\xc3\x28"]);
Throws the same exception:
Zend\Escaper\Exception\RuntimeException
String to be escaped was not valid UTF-8 or could not be converted: �(
I had debated pitching a fix for Zend Escaper -- but it seems the intent there was to actually detect and report failure which could be valid in some cases. The sprintf on bad utf8 is a strange decision, but if we agree we should retain it (and should probably never use its error message!), then this type of band-aid I felt was the next best idea in any component that uses it.
@Saeven zend-escaper is supposed to sanitize strings for the purposes of display. If escapeHtmlAttr()
is not detecting malformed UTF-8, it needs to be; we use it internally in zend-form and zend-view precisely for this purposes.
As such, I think the fix belongs in zend-escaper, and then we can update zend-form to pin to the version where such a fix is introduced.
Would you be willing to create the pull request there?
Yep my pleasure, didn't know if we should protect the original behavior. I'll submit a lower-level patch in this case.
As per discussion with @Saeven on slack, this issue is not in the Escaper
. The Escaper
correctly detects malformed UTF-8 and crashes. What we need to do is to add an UTF-8 filter (or validator) in front of input fields. Filtering would probably be the sanest option, following the idea of "if you gave me incomprehensible data, I do not even care" ('a la JSON)
This prevents the influx of invalid information into the system in first place (common issue), while adding filtering to the Escaper
introduces an attack surface wherever suppression of invalid output would cause one.
Thanks for the Slack chat this morning @weierophinney and @Ocramius. Had to jump off to get internal work done, but I would like to help.
Testing Occam's Razor, this approach solves the case for my immediate problem (on Zend Form). Also tested with Acunetix's latest version; I could continue to write real tests for nested circumstances if the approach is deemed sound:
/**
* Set data to validate and/or populate elements
*
* Typically, also passes data on to the composed input filter.
*
* @param array|\ArrayAccess|Traversable $data
* @return self
* @throws Exception\InvalidArgumentException
*/
public function setData($data)
{
if ($data instanceof Traversable) {
$data = ArrayUtils::iteratorToArray($data);
}
if (! is_array($data)) {
throw new Exception\InvalidArgumentException(sprintf(
'%s expects an array or Traversable argument; received "%s"',
__METHOD__,
(is_object($data) ? get_class($data) : gettype($data))
));
}
$this->hasValidated = false;
$this->data = $data;
// This bit here ...
array_walk($data, function (&$item, $key) {
if (is_string($item) && !preg_match('//u', $item)) {
$item = null;
}
});
$this->populateValues($data);
return $this;
}
It removes malformed utf8 from any part of the form's payload and has a very negligible impact. Probably preempts any BCs? Should the solution be more pointed?
Cheers.
Would you be open to a PR that pushes filtered values into the element set to prevent this kind of issue? In other words, if a filter is installed, no unfiltered values should ever persist within the object (post-filter).
As a developer, the covenant becomes that filters, if configured, gain god-status.
Thanks, @saeven.
Before this PR, malformed UTF8 submitted to a Zend Form causes the form helper to throw a 500 post-submission. This PR and its test eliminates these errors, by silently removing attack input during the print cycle.
Test added to demonstrate how a simple invalid 2 octet sequence can cause an exception (which then causes your server to 500).
Note that under current behavior, the helper will still throw the error despite all Filter and Validator implementations. A full test case that demonstrates this is available here: https://github.com/Saeven/zendframework-form-failure
A change is necessary to prevent routine penetration tests, or user-land attacks from causing critical exceptions. Users shouldn't be able to craft malicious input that causes 500s.
Thank you for considering this PR.