Current behavior is to normalize isolated CR or LF to CRLF when they are present in a textarea. However, chunked parser may split a perfectly valid CRLF into 2 CRLF if they happen on the chunk boundary. Here is a sample code illustrating this issue by aligning this CRLF to a 1024 CHUNK limit.
>>> import StringIO
>>> import mechanize
>>> # erroneous normalization of CRLF in textarea on CHUNK size boundary
>>> doc = "{:>1023}\r\nbar</textarea></form></html>".format("<html><form><textarea>foo")
>>> pf = mechanize.ParseFile(StringIO(doc), "http://localhost/")
>>> fp[0].controls[0].value
'foo\r\n\r\nbar'
>>> # standard and expected parsing
>>> doc = "{}\r\nbar</textarea></form></html>".format("<html><form><textarea>foo")
>>> pf = mechanize.ParseFile(StringIO(doc), "http://localhost/")
>>> fp[0].controls[0].value
'foo\r\nbar'
The issue can easily be fixed by doing the normalization after reaching the end tag instead of with incomplete data.
@@ -533,7 +533,10 @@
raise ParseError("end of TEXTAREA before start")
controls = self._current_form[2]
name = self._textarea.get("name")
+ value = self._textarea.get("value")
+ if value:
+ self._textarea["value"] = normalize_line_endings(value)
controls.append(("textarea", name, self._textarea))
self._textarea = None
def start_label(self, attrs):
@@ -580,7 +583,6 @@
elif self._textarea is not None:
map = self._textarea
key = "value"
- data = normalize_line_endings(data)
# not if within option or textarea
elif self._current_label is not None:
map = self._current_label
Current behavior is to normalize isolated CR or LF to CRLF when they are present in a textarea. However, chunked parser may split a perfectly valid CRLF into 2 CRLF if they happen on the chunk boundary. Here is a sample code illustrating this issue by aligning this CRLF to a 1024 CHUNK limit.
The issue can easily be fixed by doing the normalization after reaching the end tag instead of with incomplete data.