Closed gbnewby closed 1 year ago
Other than running time, I don't see any downside to running the validator or the input file before running ebm. In this case the error gets fixed by BeautifulSoup before Ebookmaker starts working on it, and warnings are not available.
Would that mean that the validation errors would get reported to the user when they upload and/or in output.txt file? Since we now insist that the output.txt file doesn't have errors, that would be a good way of ensuring that files with validation errors don't get uploaded.
In my naivety I thought ebookmaker was running directly on the submitted file and that the current upload form was saying it was okay to submit the file because no problems in the submitted file were reported. If we can't have it running on the submitted file, the upload process needs again to flag validator errors, which, I think would, at this stage, be a step backwards.
if it would help, I think I could add a 'prevalidate' option to ebookmaker.
No need - it's now been implemented for the online ebookmaker and is undergoing testing before going to production. It's easy enough to add a validation check on the uploaded HTML.
Thanks!
On Tue, Jul 18, 2023 at 5:19 AM Eric Hellman @.***> wrote:
if it would help, I think I could add a 'prevalidate' option to ebookmaker.
— Reply to this email directly, view it on GitHub https://github.com/gutenbergtools/ebookmaker/issues/192#issuecomment-1640106842, or unsubscribe https://github.com/notifications/unsubscribe-auth/AFQRDLXKS34TTROUCO72ELTXQZ5NXANCNFSM6AAAAAA2MFRHNI . You are receiving this because you authored the thread.Message ID: @.***>
PG has had several recent uploads via https://upload.pglaf.org where the HTML had validity errors. The errors in the uploaded text were not reported in the output.txt of https://ebookmaker.pglaf.org
Upon investigation & discussion, it appears this is because ebm runs the validator (vnu.jar) against the generated HTML5, but not against the uploaded HTML.
Since the uploaded HTML is posted to the 1/2/3 filesystem, it needs to be validated.
Is this something to add to ebm? Otherwise, we could add a call to the validator before calling ebm in https://ebookmaker.pglaf.org
Here is a simple example with a one-line HTML file that has a validation error. Online ebm reports no errors (see https://ebookmaker.pglaf.org/cache/20230715224619/output.txt [will be automatically purged after 3 days]).
Running validator.w3.org directly spots the error, of course:
Here's the simple file: test0715.zip