Closed lcgeneralprojects closed 2 weeks ago
Reformatted html.py
using black
manually.
Now that the code has largely settled, let's look at the results.
html_blockquote_indent.pdf - I know this is a pre-existing test, but it would probably be good to add a blockquoted paragraph with several lines, just to show the effect, and also to prevent possible future regressions.
html_customize_ul.pdf - Small indents.
html_li_prefix_color.pdf - Small indents.
html_li_tag_indent.pdf - Small indents.
html_ln_outside_p.pdf - Small indents.
html_ol_ul_line_height.pdf - Small indents.
html_ul_type.pdf - Small indents.
html_tag_indent.pdf - Besides the width of the indent, this one would also benefit from a bit more (multi-line) context.
html_description.pdf - Has received a vertical margin it didn't have before. Is this now more "correct" (HTML specs / typical browser behaviour)?
html_features.pdf - Reflects changes already seen in other files.
html_long_list_entries - New. Demonstrates multiline list item indent.
html_long_ol_bullets - New. Demonstrates bullet left cut-off.
The indents for lists are now much smaller than before. The previous default indent depended on the font size (5 x width of NBSP). This now seems to have changed to 5 "document units". With the default of mm this is too small. With the document units set to eg. inches, it will be way too large. You will have to pick a reasonable size in mm (eg. 8 or 10), and then make sure that if the document units aren't mm, this value gets converted appropriately before being used. The documentation also must clearly state that tag_indents
values have to be given in document units.
Warning: I'm unable to test this right now, but this is my conclusion looking at the code. You'll have to test what actually happens when the document units aren't mm, and obviously those tests need to be added to our permanent collection.
Maybe the simplest solution will be to whenever we assign a value from DEFAULT_TAG_INDENTS
to self.tag_indents, to systematically convert it from mm to document units.
Note that top and bottom margins of Paragraph()
s are also in document units (which the documentation currently doesn't make sufficiently clear).
You'll have to take the same precautions with those as with the indents. In most cases they are defined in terms of font height, which is fine, but there are some hard-coded "magic numbers" present in the code as well. Most of those will have been there before you started (you can probably blame me for some of them :wink:), but this is a good opportunity to fix them.
The same applies to the last hardcoded self._ln(2)
call within the <li>
tag code. This also incorrectly assumes mm, and needs to be converted to document units.
The docstring for Paragraph()
looks good. However, the end user will access this functionality through ParagraphCollectorMixin.paragraph()
, so maybe it should rather go there?
Regarding tests with small indents, do you want me to pass tag_indents
values into pdf.write_html()
calls in tests where that argument is not used, in addition to adjusting default tag indentation handling?
Regarding the new margins between elements in html_description.pdf
- Firefox and Edge both produce the margin with that HTML code.
I might not have time to confidently deal with the 'magic numbers' tonight, so I will likely be pushing the changes tomorrow, and not today. Should I just do the conversion into appropriate units with them? If so, would you prefer for me to intentionally change them a little in order for them to look nicer, or would you prefer a more exact conversion?
EDIT: Going to note that, currently, due to the HTML2FPDF._ln()
call when handling <li>
start tags, there will be a gap of the relevant size, even if the list is the first visible content of the document, in case there will be a need to eliminate it in the future.
Actually, I see an issue with how the default values for li_tag_indent
and dd_tag_indent
are handled in my code. I am currently unable to dedicate time to fixing it, so that will have to wait until tomorrow.
Not sure how that happened, considering that I was accepting changes from master
when merging.
Not sure how that happened, considering that I was accepting changes from
master
when merging.
I think I've merged #1198 after your last rebase. Just do another one.
Wait, in 69c4107 you seem to have reverted all of your changes to test_html.py. You'll want to restore those, except for test_html_customize_ul()
.
When you do a rebase and there are conflicts, you usually need to resolve those manually.
The files will contain marked sections that show the differences, and you can chose the right parts for each section.
I am aware. I opted to go for blanket-accepting changes from master
because I thought that it will just resolve that particular conflict and not just replace the entire file.
Trying to figure out how to re-do the merge without destroying the history. (Although, yeah, I'm aware that I can just copy the relevant stuff and paste it manually. I would still rather try to find out how to deal with situations like these.)
Don't worry about the history. We flatten everything into a single commit anyway when merging a PR.
Merged.
Thanks for the useful contribution, @lcgeneralprojects ! 👍
@allcontributors please add @lcgeneralprojects for bug, code
@Lucas-C
I've put up a pull request to add @lcgeneralprojects! :tada:
@lcgeneralprojects
Congrats on successfully completing this pull request. This was no small feat! You changed some complex parts of the code and despite the numerous iterations you showed patience and dedication to push through.
Thank you for your hard work and perseverance. Looking forward to more collaborations in the future!
Fixes #1073 Implements paragraph indentation via adjustments of
pdf.x
instead of using a natural number of whitespaces. Breaks up list items into individual paragraphs.Checklist:
[x] The GitHub pipeline is OK (green), meaning that both
pylint
(static code analyzer) andblack
(code formatter) are happy with the changes of this PR.[x] A unit test is covering the code added / modified by this PR
[x] This PR is ready to be merged
[x] In case of a new feature, docstrings have been added, with also some documentation in the
docs/
folder[x] A mention of the change is present in
CHANGELOG.md
Not sure how to actually name the
HTML2FPDF.list_pseudo_margin
attribute. It is used for determining the height of the\n
line created when a<ul>
or<ol>
starting tag is handled.Should the re-implementation of paragraph indentation be reflected in a doctstring, even though the
tag_indents
parameter ofwrite_html()
has not been touched? If so, where should it be placed?By submitting this pull request, I confirm that my contribution is made under the terms of the GNU LGPL 3.0 license.