gutenbergtools / ebookmaker

The Project Gutenberg tool to generate EPUBs and other ebook formats.
GNU General Public License v3.0
70 stars 17 forks source link

Errors in automatically generated HTML files #228

Closed okrick closed 3 weeks ago

okrick commented 4 weeks ago

I added the images and HTML omitted from #13913 by the previous DP PPer by using the generated HTML file as a template. I was surprised that there were several issues with the PG CSS boilerplate as reported by the W3C HTML markup validator (http://validator.w3.org). Improperly formatted comments in the CSS was the primary issue.

Further, the \ tag did not meet PG's standard of being on a distinct line.

I regard this as low priority as this only affects someone who wants to use a generated HTML file as a template--unless the buggy CSS adversely impacts the eBooks.

eshellman commented 4 weeks ago

generated HTML is not intended to be used a a source file- you'll get duplicate metatags and duplicated boilerplate. Nor does it pay attention to any DP guidelines.

But... I can't reproduce your issue and I'm puzzled by "Improperly formatted comments in the CSS" as there are no comments in the boilerplate CSS. https://github.com/gutenbergtools/ebookmaker/blob/c2e4704f1af59716058d366caf16177254f9b86c/src/ebookmaker/writers/TemplateStrings.py#L16-L83

okrick commented 4 weeks ago

The \ tag is a PG guideline that PG's Errata Workbench enforces.

The above is not the CSS from the generated file. [pg13913-images.zip](https://github.com/user-attachments/files/15567095/pg13913-images.zip] This HTML file was from right-clicking on the "Read online (web)" link [https://www.gutenberg.org/ebooks/13913.html.images].

Are you certain the above template is used to create HTML from text files? Nearly every line of the CSS I saw was commented and the multiple-line comments were not correctly formatted. There was no HTML among the files I downloaded through the Errata Workbench, so I downloaded and used the generated file.

eshellman commented 4 weeks ago

let's make sure we're talking about the same thing. a css comment looks like this: /* This is a comment */

eshellman commented 4 weeks ago

I'm unfamiliar with the Errata Workbench - perhaps it's mucking with the CSS?

asylumcs commented 4 weeks ago

The Errata Workbench does not modify the CSS.

okrick commented 4 weeks ago

Oh, my error. This HTML's CSS section is completely different from the one I got the first time. Something may have changed when I uploaded it to Github. I'll try to find a pristine-generated HTML.

okrick commented 4 weeks ago

I agree. The Errata Workbench doesn't modify anything in the uploaded files. It does move them from the old system to Github. So changing the source of the text file from the old system to the new one may have altered the source of the template used to generate the HTML from the text file.

I was surprised that my HTML was not used for the web version. I assumed that the old Generated HTML was migrated also.

eshellman commented 3 weeks ago

Source HTML is parsed, normalized, and updated to HTML5, accessibility markup is added, header and footer are replaced, metatags and boilerplate css are added to make the web version.

@okrick If you were surprised, then probably there is some documentation that is missing or needs improvement. Suggestions?

okrick commented 3 weeks ago

Here is an example of a generated HTML file with multiple-line CSS comments. pg212-images.zip

Every one of the multi-line CSS comments had to be removed for the file to pass the W3C HTML markup validator on the file I was adding images to. I have no idea why this file passes the W3C HTML markup validator. I wish I had saved the earlier file.

Please note that the tag in this file is not on a line of its own in violation of PGs policy. PGs Errata Workbench will show errors should one try to upload this.

eshellman commented 3 weeks ago

the multiline css comments in the file you've posted are correctly formatted. If the errata workbench is flagging them as errors it is a bug that should be fixed.

okrick commented 3 weeks ago

It is not PGs Errata Workbench flagging those as errors. The W3C HTML markup validator reported them as errors. Oddly the W3C CSS validator reported the CSS was valid.

eshellman commented 3 weeks ago

@okrick can you provide details on how to reproduce the html validator errors? When I try it I get no errors.

okrick commented 3 weeks ago

Sorry, I should have kept the specific file that generated the error. It never existed in the files directory in the old system and thus was never archived nor brought forward to Github.

I suggest closing this for the time being. I never created an HTML in PG where none existed before so this is likely a rare occurrence.

eshellman commented 3 weeks ago

thanks!