I went with having separate logic for preformat tags in handle_data and
get_lines functions of HTMLtoLines class. This is for two reasons:
Not cleaning whitespace characters with re.sub() in handle_data:
Whereas with other tags like <blockquote>, which aren't completely
whitespace-dependent, <pre> text requires newlines and
indentation/tabs be preserved for... well, formatting.
Different logic for <pre> text in get_lines: textwrap defaults to
replace_whitespace=true for wrap(), and it's suggested that
str.splitlines() be used for newlines rather than setting
replace_whitespace=false to prevent inconsitencies with formatting.
I think this may cause issues with other indent tags, s rather
than trying to balance parsing non-preformat tags against
preformatted text, it seemed more reasonable to have the preformat
text be parsed separately.
Aside from that, I tried to keep the variable names and logic consistent
with the current code.
I went with having separate logic for preformat tags in
handle_data
andget_lines
functions of HTMLtoLines class. This is for two reasons:re.sub()
inhandle_data
: Whereas with other tags like<blockquote>
, which aren't completely whitespace-dependent,<pre>
text requires newlines and indentation/tabs be preserved for... well, formatting.<pre>
text inget_lines
: textwrap defaults toreplace_whitespace=true
forwrap()
, and it's suggested thatstr.splitlines()
be used for newlines rather than settingreplace_whitespace=false
to prevent inconsitencies with formatting. I think this may cause issues with otherindent
tags, s rather than trying to balance parsing non-preformat tags against preformatted text, it seemed more reasonable to have the preformat text be parsed separately.Aside from that, I tried to keep the variable names and logic consistent with the current code.