Closed josteinaj closed 1 year ago
@kalaspuffar I added two items under "Finalize and release an official version (without Pipeline 2)". We have touched on this on e-mail earlier, but it wasn't described here. We haven't finalized and released an official version until we have documentation and a tagged version on docker hub. I hope this is ok.
Hi @josteinaj
Well, it's fine if MTM and NLB want to Document and Verify the process before we see this issue resolved. Just tell me if we at Textalk are expected to do anything more with this issue.
Best regards Daniel
The API is the same as for https://github.com/mtmse/talking-book-validator/, right? Is there any documentation for that one?
@martinpub @karladamt @oscarlcarlsson what do you think about documentation? Also, could you test the latest version, and see that it works for you as a replacement for your talking-book-validator? It's available as the docker image nlbdev/nordic-epub3-dtbook-migrator:latest
.
Hi @josteinaj
The current documentation we use today is the https://github.com/nlbdev/nordic-epub3-dtbook-migrator/blob/master/web/docs/api_v1.yaml
This file explains the API in an OpenAPI format that you could generate a live document/viewer and that could be used for testing and verification as well. But it should also be read as a simple file.
For more information about the format you could read up on https://www.openapis.org/
Best regards Daniel
Yes, I imported the yaml earlier into Postman to test the API and it works great :+1:. We should probably link to that file in the description on gh-pages. And I think we also need a short example of how to use the validator. Something along the lines of:
docker run --rm -it nlbdev/nordic-epub3-dtbook-migrator (…)
curl -X POST (…)
Hi @josteinaj
That might be the way to validate that the docker image works. Something along the line of:
Update your env.list
in the web directory using env.list.template
as a reference.
cd web
./start.sh
To validate the docker image usage run:
curl -F "file=@{path_to_file}" localhost:8080/v1/Validation -o report.json
Then look at the JSON file for a response.
On the other hand, if you want to run the validation locally, then it would be much easier just to download a prebuilt jar file from us and run:
java -jar NordicValidator.jar [epubfile] --output-html html-report.html
or something similar.
The only thing you need to install yourself then would be ACE, which is documented already on the daisy homepage.
Best regards Daniel
Hi @josteinaj
Forgot to mention. ACE is not a requirement for the application, just for 2020-1 rules. So if you run it without, you'll get a prompt to install ACE for correct 2020-1 validation. But the application will still give you a report without that section.
Best regards Daniel
Hi.
gh-pages
branch on how to run this is needed@oscarlcarlsson have you had time for more testing? Is the current version good enough for use in production at MTM?
@josteinaj I got access to the web interface of webarch's validator last week. Large files does not seem to be an issue with them. We have one file that does not validate due to an epubckeck-bug that should be sorted out in the current beta-version.
Sounds great! Does that mean that it is only the documentation that is missing for phase one?
Not quite. I have found some old issues that are back in the current version due to using the NLB-version as the master file. I am running some files in the validator at the moment and repporting them in the Trello.
I am looking for the errors on github to link them here as well.
the current errors i am experiencing are: [nordic26c] Each note must have one <a role="doc-backlink" ...>. (
and: [xhtml] (Line: 00030 Column: 00024) element "hr" not allowed here; expected the element end-tag, text or element "a", "abbr", "address", "aside", "bdo", "blockquote", "br", "code", "dfn", "div", "dl", "em", "figure", "h1", "h2", "h3", "h4", "h5", "h6", "img", "kbd", "ns:annotation", "ns:list", "ns:math", "ol", "p", "pre", "q", "samp", "section", "span", "strong", "sub", "sup", "table" or "ul" (with xmlns:ns="http://www.w3.org/1998/Math/MathML")
Neither error is flagged in the MTM-fork of the validator.
When it comes to the backlink issue, that is something we resolved "recently". We have had an open PR for months that was approved a couple of weeks ago.
https://github.com/nlbdev/nordic-epub3-dtbook-migrator/issues/478
Created an PR for the HR in sidebar issue. https://github.com/nlbdev/nordic-epub3-dtbook-migrator/issues/521
Hi @oscarlcarlsson
PR #521 is now deployed on Webarch Validator service.
Great! I have done a test-run on a title and it validated this time.
Do you need any specific input for #478 @kalaspuffar ?
Do you need any specific input for #478 @kalaspuffar ?
No, not really, just pointed out when we added the restriction you now get a warning for.
I've run into the issue reported in #532 on some files that I've run through the validator today. At this point, i think that #532 and #478 are the two last things that are not validating. There might be more in a later stage. but, those are the ones that have been reocurring during the testing today.
Hi @oscarlcarlsson
When it comes to https://github.com/nlbdev/nordic-epub3-dtbook-migrator/issues/478 the validator is correct but what you want with your EPUB is another question in general.
<p>
... Lorem ipsum dolor sit amet, consectetur adipiscing elit."
<a epub:type="noteref" href="V006287-025-endnotes.xhtml#c07082" id="c07082_1" role="doc-noteref">82</a>
Duis ut nisi in sem accumsan lobortis. Sed eget odio euismod, vehicula ipsum eu, porttitor eros. Aliquam dapibus congue tortor in finibus."
<a epub:type="noteref" href="V006287-025-endnotes.xhtml#c07082" id="c07082_2" role="doc-noteref">82</a>
Ut tempus id sem eu feugiat. Cras nec velit volutpat, gravida ligula id, efficitur turpis. Praesent tincidunt euismod diam ac hendrerit."
<a epub:type="noteref" href="V006287-025-endnotes.xhtml#c07082" id="c07082_3" role="doc-noteref">82</a>
</p>
Then you have the reference:
<li epub:type="endnote" id="c07082" role="doc-endnote">
<p>82. Lorem ipsum dolor sit amet.</p>
<p>
<a href="V006287-019-chapter.xhtml#c07082_1" role="doc-backlink">Gå tillbaka till notreferensen.</a>
</p>
</li>
As you need to have a back reference to all references to the note, you are missing 2 links. In this case, they are in the same paragraph, so that you will jump to the same spot. But that is no assurance; in most cases, this would not be the case and will confuse the reader.
Fredrik and I have talked about this, and maybe a meeting of the specification council would be suitable for the beginning of 2023.
Having only one backlink seems unreasonable for some material, and having multiple is confusing. This seems like a solution for a reading system issue, not a specification issue.
Best regards Daniel
Hi @oscarlcarlsson
Regarding #532, the error you've seen is unrelated to this fix. Having no headings is a separate case that has not been handled before, as it has not come up in any discussion. I've created a PR (https://github.com/nlbdev/nordic-epub3-dtbook-migrator/pull/539) trying to solve this issue.
Best regards Daniel
@karladamt @oscarlcarlsson @kalaspuffar what is the status here? Are you ready to make a release?
Hi @josteinaj
On my end, I don't have any work left, someone wanted to change the documentation and MTM needs to verify some of the fixes. But on my end, everything mentioned in the plan seems to be done.
Best regards Daniel
All good on our end as well!
I see that #395 is still not marked as done. What remains there?
I just noticed that @oscarlcarlsson had some issues with 2015-1 EPUBs here: https://github.com/nlbdev/nordic-epub3-dtbook-migrator/issues/515, so we should verify that 2015-1 still works before making the release.
From meeting between Textalk, MTM and NLB on 22. June:
Hi @kalaspuffar! I just wanted to check where we are on this one. Is it being worked on?
bump @kalaspuffar
Hi @josteinaj
No, I understood from our last meeting that I should prepare the docker image for release. That PR is merged.
And send an email with documentation information to you and that is also done.
I hope that I've not missed or misunderstood any my responsibilities. Currently I'm not doing anything more for this release.
Best regards Daniel
Hi Daniel!
I can't remember having received the documentation, could you resend it to me so that I can have another look?
It should be in a form that fits into this page: http://nlbdev.github.io/nordic-epub3-dtbook-migrator/
And since it's a new API, we can't just point to the Pipeline 2 API documentation. We need to provide API documentation on that page (or a separate page) as well, along with examples of how to run jobs.
Hi @josteinaj
I am sending the same information I sent in the last mail here as it might not arrive, and here we have documentation of what has been discussed.
When it comes to documentation, the SwaggerAPI / OpenAPI documentation can be viewed either by downloading the editor or going to https://editor.swagger.io/
You could either download the file and upload it to the editor or import the URL directly. https://raw.githubusercontent.com/nlbdev/nordic-epub3-dtbook-migrator/master/web/docs/api_v1.yaml
Building the docker image should not be more complicated than building any other image:
docker build -t nordic-epub-validator .
And running it only requires exposing the web server port to access the API.
docker run --publish 8080:80 nordic-epub-validator
If there is anything else you need for the documentation then don't hesitate to reach out.
Best regards Daniel
Hi Daniel!
Right, I found the e-mail now. It was right before summer vacation and I see I've forgot to reply to it. Sorry.
We need some documentation of the usage in addition to the yaml. Could you write the commands with some comments on how to use it? Say I have an EPUB, how do I post it to the API (with curl or wget example), how do I check the status of the job (if it's asynchronous), and how can I get the results? For basic usage, I don't think we should require users to open swagger or similar.
Hi @josteinaj
The swagger documentation is following the standard and can be used to produce pretty much what you want.
Open it in the editor and export it as an zip file with html documents or print it as a PDF depending on what you want to present. But the best representation is the live view where you can try the API out.
Best regards Daniel
Hi @josteinaj
Seems they had removed the print utility, so I've found another site that could generate PDF output for those that don't want the interactive GUI.
Best regards Daniel
Hi Daniel.
Thank you.
Could you also write a step-by-step example of how to validate an EPUB from the command line? Using either curl or wget.
This is so that the somewhat-technical users, that are not developers, can use the validator without too much trouble.
Regards Jostein
Hi @josteinaj
Well docker images with a restish API aren't for command line but I guess the easiest is to run
./createSchemas.sh
mvn package
java -jar target/NordicValidator-[version]-jar-with-dependencies.jar input.epub
If you know how to run curl you probably can build a jar package.
If you require something for none developers we need a WebUI in the image for uploading. But that is not in the current scope of the project
Otherwise I could record a video on how to run it from Postman. Also not in the current scope of the project.
Best regards Daniel
POST /v1/Validation/ uploadFilePath string Path to output html report on OneDrive downloadFilePath string Path to epub file stored on OneDrive
I remember we discussed an option that didn't require OneDrive? The default should not be OneDrive. OneDrive is a MTM/Webarch-specific feature.
Could it also be possible to POST an EPUB directly to the API? That would make the API easier to use in many cases. It was possible when we used the Pipeline 2 API.
Hi @josteinaj
The PDF is a bit harder to read, but there are two options for the same API endpoint. So the /v1/Validation can have either a JSON body or a form-data post.
FORM DATA PARAMETERS
NAME TYPE DESCRIPTION
config object Validation configuration
file string(binary) File to upload as a multipart upload
As I said earlier, I could create a video for Postman, a webpage for uploads, or a small client API. But I never done a multipart upload via curl, but if it works, I guess it would look something like this:
curl -F config="{"noEPUBCheck":false,"noACE":false,"schema":"2020-1"}" -F file=@filename.epub http://localhost:8080/v1/Validation
Best regards Daniel
Thanks! It seems to work to validate like that :+1:.
So first I start the container in one terminal like this:
docker run --publish 8080:80 nlbdev/nordic-epub3-dtbook-migrator
And then in another terminal, I navigate to the sample EPUB in src/test/resources/2020-1 and run:
curl -s -F config='{"noEPUBCheck":false,"noACE":false,"schema":"2020-1"}' -F file=@X60352A.epub http://localhost:8080/v1/Validation
The response I get is this:
{
"uploadFilePath": "X60352A.epub",
"datetime": "2023-09-08 12:50:02",
"book": "Om det nord-tschudiska språket",
"schema": "2020-1",
"report": {
"issue-count": 0,
"filename": "X60352A.epub",
"schema-info": {
"opf_and_html": {
"filename": "nordic2020-1.opf-and-html.xsl",
"description": "Cross-document references and metadata",
"document-type": "Nordic EPUB3 OPF+HTML"
},
"ace": {
"filename": "",
"description": "Validating with ACE 1.2.7",
"document-type": "DAISY Accessibility Checker for EPUB"
},
"opf": {
"filename": "nordic2020-1.opf.xsl",
"description": "",
"document-type": "Nordic EPUB3 Package Document"
},
"content_files_schema": {
"filename": "nordic2020-1.xsl",
"description": "",
"document-type": "Nordic HTML (EPUB3 Content Document)"
},
"epub": {
"filename": "",
"description": "General EPUB requirements",
"document-type": "Nordic EPUB3"
},
"nav_ncx": {
"filename": "nordic2020-1.nav-ncx.xsl",
"description": "",
"document-type": "Nordic EPUB3 NCX and Navigation Document"
},
"nav_references": {
"filename": "nordic2020-1.nav-references.xsl",
"description": "References from the navigation document to the content documents",
"document-type": "Nordic EPUB3 Navigation Document References"
},
"epubcheck": {
"filename": "",
"description": "Validating with EPUBCheck 5.0.0",
"document-type": "EPUBCheck EPUB3"
},
"xhtml": {
"filename": "nordic-html5.rng",
"description": "",
"document-type": ""
}
},
"created": "2023-09-08 14:50:07",
"guideline": "Nordic EPUB Guideline 2020-1",
"issues": [],
"status": "SUCCESS"
}
}
It says "SUCCESS", but the docker container logs an exception, is it anything to worry about?
java.util.concurrent.ExecutionException: java.lang.NoSuchMethodError: 'boolean net.sf.saxon.om.NameChecker.isValidNCName(java.lang.String)'
at java.base/java.util.concurrent.FutureTask.report(FutureTask.java:122)
at java.base/java.util.concurrent.FutureTask.get(FutureTask.java:191)
at org.daisy.validator.EPUBFiles.validate(EPUBFiles.java:351)
at org.daisy.validator.NordicValidator.main(NordicValidator.java:129)
Caused by: java.lang.NoSuchMethodError: 'boolean net.sf.saxon.om.NameChecker.isValidNCName(java.lang.String)'
at com.adobe.epubcheck.vocab.PrefixDeclarationParser.parsePrefixMappings(PrefixDeclarationParser.java:105)
at com.adobe.epubcheck.vocab.VocabUtil.parsePrefixDeclaration(VocabUtil.java:179)
at com.adobe.epubcheck.opf.OPFHandler30.startElement(OPFHandler30.java:196)
at com.adobe.epubcheck.xml.handlers.XMLHandler.startElement(XMLHandler.java:115)
at com.adobe.epubcheck.xml.handlers.DelegateDefaultHandler.startElement(DelegateDefaultHandler.java:170)
at com.adobe.epubcheck.xml.handlers.WrappingDefaultHandler.startElement(WrappingDefaultHandler.java:95)
at com.adobe.epubcheck.xml.handlers.PreprocessingDefaultHandler.startElement(PreprocessingDefaultHandler.java:59)
at org.apache.xerces.parsers.AbstractSAXParser.startElement(Unknown Source)
at org.apache.xerces.impl.XMLNSDocumentScannerImpl.scanStartElement(Unknown Source)
at org.apache.xerces.impl.XMLNSDocumentScannerImpl$NSContentDispatcher.scanRootElementHook(Unknown Source)
at org.apache.xerces.impl.XMLDocumentFragmentScannerImpl$FragmentContentDispatcher.dispatch(Unknown Source)
at org.apache.xerces.impl.XMLDocumentFragmentScannerImpl.scanDocument(Unknown Source)
at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source)
at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source)
at org.apache.xerces.parsers.XMLParser.parse(Unknown Source)
at org.apache.xerces.parsers.AbstractSAXParser.parse(Unknown Source)
at org.apache.xerces.jaxp.SAXParserImpl$JAXPSAXParser.parse(Unknown Source)
at org.apache.xerces.jaxp.SAXParserImpl.parse(Unknown Source)
at com.adobe.epubcheck.xml.XMLParser.process(XMLParser.java:176)
at com.adobe.epubcheck.opf.OPFChecker.checkContent(OPFChecker.java:203)
at com.adobe.epubcheck.opf.OPFChecker30.checkContent(OPFChecker30.java:79)
at com.adobe.epubcheck.opf.OPFChecker.checkPackage(OPFChecker.java:111)
at com.adobe.epubcheck.opf.OPFChecker30.checkPackage(OPFChecker30.java:67)
at com.adobe.epubcheck.opf.OPFChecker.check(OPFChecker.java:94)
at com.adobe.epubcheck.ocf.OCFChecker.check(OCFChecker.java:174)
at com.adobe.epubcheck.api.EpubCheck.doValidate(EpubCheck.java:218)
at org.daisy.validator.epubcheck.EPUBCheckValidator.call(EPUBCheckValidator.java:24)
at org.daisy.validator.epubcheck.EPUBCheckValidator.call(EPUBCheckValidator.java:12)
at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
at java.base/java.lang.Thread.run(Thread.java:829)
Postman-video, webpage for uploads and client API:
/v1/
or /
?) would be very nice if it's easy to set up. :+1:I've added documentation to the homepage now, with both how to use it as a command line interface, and using the REST API (using curl as an example):
Do you think it looks ok?
Hi @josteinaj
It looks ok, but I think you have a small typo in the first command nlbdev/nordic-epub3-dtbook-migrato
Best regards Daniel
$?
, and you can manually inspect the output that comes after "EPUB Validate [time]", but a final success/failed would be nice~ ❯ docker run --rm -it nlbdev/nordic-epub3-dtbook-migrator bash
root@dbe2d2b76506:/var/www/html# ace
[0920/151038.747077:FATAL:electron_main_delegate.cc(294)] Running as root without --no-sandbox is not supported. See https://crbug.com/638180.
/usr/local/lib/nodejs/node-v16.18.0-linux-x64/lib/node_modules/@daisy/ace/node_modules/electron/dist/electron exited with signal SIGTRAP
root@dbe2d2b76506:/var/www/html# ace --no-sandbox
[30:0920/151043.903579:ERROR:bus.cc(399)] Failed to connect to the bus: Failed to connect to socket /run/dbus/system_bus_socket: No such file or directory
[30:0920/151044.180912:ERROR:ozone_platform_x11.cc(240)] Missing X server or $DISPLAY
[30:0920/151044.180923:ERROR:env.cc(255)] The platform failed to initialize. Exiting.
The futex facility returned an unexpected error code.
/usr/local/lib/nodejs/node-v16.18.0-linux-x64/lib/node_modules/@daisy/ace/node_modules/electron/dist/electron exited with signal SIGABRT
@kalaspuffar how do I validate with ace?
Hi @josteinaj
I'm not sure what is going wrong there. But this Docker image has been tested, and we have gotten Ace results. Perhaps the ace engine has been updated since last we tested ?
Looking at the class for Ace in the client, there are no special flags.
https://github.com/nlbdev/nordic-epub3-dtbook-migrator/blob/master/client/src/main/java/org/daisy/validator/ace/ACEValidator.java
Best regards Daniel
It seems that the latest working version of @daisy/ace
is 1.2.7, so I downgraded to that one.
@kalaspuffar I tagged a v2.0.0
version, and it's building on docker hub now. Could you verify that it works for you?
Hi @josteinaj
I've searched on DockerHub, and I can't see it at all. If I search for "nordic-epub3" I can find a sbsdev released 3 years back but not your version.
I tried to log in as well but could not find it either. Is it a private repository?
Best regards Daniel
Hi @josteinaj
We have also looked into the Ace repository, and version 1.2.8 should work just fine.
But as of version 1.3.0 they have deprecated Puppeteer as a main driver for the Axe plugin validation. Puppeteer is a tool to run Chrome in a headless mode and is good if you want to run it in a docker container, for instance.
I don't know if Chrome has also deprecated Puppeteer, and that could be the leading factor in this change. But Ace is now using a pure Electron implementation to create nodejs interfaces by starting a slimmed-down version of Chrome on your desktop.
Because it will open a window, it will not work inside a docker container without a graphical interface.
Best regards Daniel
Hi @kalaspuffar!
We recently went through and made stuff private, and this must've been made private by mistake.
Could you try again now? Now it should be public.
Hi @josteinaj
It seems to work just fine now that I have access.
Best regards Daniel