nlbdev / nordic-epub3-dtbook-migrator

Tools for converting between a strict subset of DTBook and EPUB3.
http://nlbdev.github.io/nordic-epub3-dtbook-migrator/
GNU Lesser General Public License v2.1
8 stars 7 forks source link

Further development, end of 2022 ("phase one") #522

Closed josteinaj closed 1 year ago

josteinaj commented 2 years ago
josteinaj commented 1 year ago

@kalaspuffar I added two items under "Finalize and release an official version (without Pipeline 2)". We have touched on this on e-mail earlier, but it wasn't described here. We haven't finalized and released an official version until we have documentation and a tagged version on docker hub. I hope this is ok.

kalaspuffar commented 1 year ago

Hi @josteinaj

Well, it's fine if MTM and NLB want to Document and Verify the process before we see this issue resolved. Just tell me if we at Textalk are expected to do anything more with this issue.

Best regards Daniel

josteinaj commented 1 year ago

The API is the same as for https://github.com/mtmse/talking-book-validator/, right? Is there any documentation for that one?

@martinpub @karladamt @oscarlcarlsson what do you think about documentation? Also, could you test the latest version, and see that it works for you as a replacement for your talking-book-validator? It's available as the docker image nlbdev/nordic-epub3-dtbook-migrator:latest.

kalaspuffar commented 1 year ago

Hi @josteinaj

The current documentation we use today is the https://github.com/nlbdev/nordic-epub3-dtbook-migrator/blob/master/web/docs/api_v1.yaml

This file explains the API in an OpenAPI format that you could generate a live document/viewer and that could be used for testing and verification as well. But it should also be read as a simple file.

For more information about the format you could read up on https://www.openapis.org/

Best regards Daniel

josteinaj commented 1 year ago

Yes, I imported the yaml earlier into Postman to test the API and it works great :+1:. We should probably link to that file in the description on gh-pages. And I think we also need a short example of how to use the validator. Something along the lines of:

  1. in a terminal; run the validator using docker run --rm -it nlbdev/nordic-epub3-dtbook-migrator (…)
  2. in a separate terminal; validate the EPUB file using curl -X POST (…)
  3. the validation report can be viewed in a browser by (…)
kalaspuffar commented 1 year ago

Hi @josteinaj

That might be the way to validate that the docker image works. Something along the line of:

Update your env.list in the web directory using env.list.template as a reference.

cd web
./start.sh

To validate the docker image usage run:

curl -F "file=@{path_to_file}" localhost:8080/v1/Validation -o report.json

Then look at the JSON file for a response.

On the other hand, if you want to run the validation locally, then it would be much easier just to download a prebuilt jar file from us and run:

java -jar NordicValidator.jar [epubfile] --output-html html-report.html

or something similar.

The only thing you need to install yourself then would be ACE, which is documented already on the daisy homepage.

Best regards Daniel

kalaspuffar commented 1 year ago

Hi @josteinaj

Forgot to mention. ACE is not a requirement for the application, just for 2020-1 rules. So if you run it without, you'll get a prompt to install ACE for correct 2020-1 validation. But the application will still give you a report without that section.

Best regards Daniel

josteinaj commented 1 year ago

Hi.

josteinaj commented 1 year ago

@oscarlcarlsson have you had time for more testing? Is the current version good enough for use in production at MTM?

oscarlcarlsson commented 1 year ago

@josteinaj I got access to the web interface of webarch's validator last week. Large files does not seem to be an issue with them. We have one file that does not validate due to an epubckeck-bug that should be sorted out in the current beta-version.

josteinaj commented 1 year ago

Sounds great! Does that mean that it is only the documentation that is missing for phase one?

oscarlcarlsson commented 1 year ago

Not quite. I have found some old issues that are back in the current version due to using the NLB-version as the master file. I am running some files in the validator at the moment and repporting them in the Trello.

I am looking for the errors on github to link them here as well.

the current errors i am experiencing are: [nordic26c] Each note must have one <a role="doc-backlink" ...>. (

kalaspuffar commented 1 year ago

When it comes to the backlink issue, that is something we resolved "recently". We have had an open PR for months that was approved a couple of weeks ago.

https://github.com/nlbdev/nordic-epub3-dtbook-migrator/issues/478

kalaspuffar commented 1 year ago

Created an PR for the HR in sidebar issue. https://github.com/nlbdev/nordic-epub3-dtbook-migrator/issues/521

kalaspuffar commented 1 year ago

Hi @oscarlcarlsson

PR #521 is now deployed on Webarch Validator service.

oscarlcarlsson commented 1 year ago

Great! I have done a test-run on a title and it validated this time.

oscarlcarlsson commented 1 year ago

Do you need any specific input for #478 @kalaspuffar ?

kalaspuffar commented 1 year ago

Do you need any specific input for #478 @kalaspuffar ?

No, not really, just pointed out when we added the restriction you now get a warning for.

oscarlcarlsson commented 1 year ago

I've run into the issue reported in #532 on some files that I've run through the validator today. At this point, i think that #532 and #478 are the two last things that are not validating. There might be more in a later stage. but, those are the ones that have been reocurring during the testing today.

kalaspuffar commented 1 year ago

Hi @oscarlcarlsson

When it comes to https://github.com/nlbdev/nordic-epub3-dtbook-migrator/issues/478 the validator is correct but what you want with your EPUB is another question in general.

<p>
... Lorem ipsum dolor sit amet, consectetur adipiscing elit."
<a epub:type="noteref" href="V006287-025-endnotes.xhtml#c07082" id="c07082_1" role="doc-noteref">82</a> 
Duis ut nisi in sem accumsan lobortis. Sed eget odio euismod, vehicula ipsum eu, porttitor eros. Aliquam dapibus congue tortor in finibus."
<a epub:type="noteref" href="V006287-025-endnotes.xhtml#c07082" id="c07082_2" role="doc-noteref">82</a> 
Ut tempus id sem eu feugiat. Cras nec velit volutpat, gravida ligula id, efficitur turpis. Praesent tincidunt euismod diam ac hendrerit."
<a epub:type="noteref" href="V006287-025-endnotes.xhtml#c07082" id="c07082_3" role="doc-noteref">82</a>
</p>

Then you have the reference:

<li epub:type="endnote" id="c07082" role="doc-endnote">
<p>82. Lorem ipsum dolor sit amet.</p>
<p>
<a href="V006287-019-chapter.xhtml#c07082_1" role="doc-backlink">Gå tillbaka till notreferensen.</a>
</p>
</li>

As you need to have a back reference to all references to the note, you are missing 2 links. In this case, they are in the same paragraph, so that you will jump to the same spot. But that is no assurance; in most cases, this would not be the case and will confuse the reader.

Fredrik and I have talked about this, and maybe a meeting of the specification council would be suitable for the beginning of 2023.

Having only one backlink seems unreasonable for some material, and having multiple is confusing. This seems like a solution for a reading system issue, not a specification issue.

Best regards Daniel

kalaspuffar commented 1 year ago

Hi @oscarlcarlsson

Regarding #532, the error you've seen is unrelated to this fix. Having no headings is a separate case that has not been handled before, as it has not come up in any discussion. I've created a PR (https://github.com/nlbdev/nordic-epub3-dtbook-migrator/pull/539) trying to solve this issue.

Best regards Daniel

josteinaj commented 1 year ago

@karladamt @oscarlcarlsson @kalaspuffar what is the status here? Are you ready to make a release?

kalaspuffar commented 1 year ago

Hi @josteinaj

On my end, I don't have any work left, someone wanted to change the documentation and MTM needs to verify some of the fixes. But on my end, everything mentioned in the plan seems to be done.

Best regards Daniel

oscarlcarlsson commented 1 year ago

All good on our end as well!

josteinaj commented 1 year ago

I see that #395 is still not marked as done. What remains there?

josteinaj commented 1 year ago

I just noticed that @oscarlcarlsson had some issues with 2015-1 EPUBs here: https://github.com/nlbdev/nordic-epub3-dtbook-migrator/issues/515, so we should verify that 2015-1 still works before making the release.

josteinaj commented 1 year ago

From meeting between Textalk, MTM and NLB on 22. June:

josteinaj commented 1 year ago

Hi @kalaspuffar! I just wanted to check where we are on this one. Is it being worked on?

josteinaj commented 1 year ago

bump @kalaspuffar

kalaspuffar commented 1 year ago

Hi @josteinaj

No, I understood from our last meeting that I should prepare the docker image for release. That PR is merged.

And send an email with documentation information to you and that is also done.

I hope that I've not missed or misunderstood any my responsibilities. Currently I'm not doing anything more for this release.

Best regards Daniel

josteinaj commented 1 year ago

Hi Daniel!

I can't remember having received the documentation, could you resend it to me so that I can have another look?

It should be in a form that fits into this page: http://nlbdev.github.io/nordic-epub3-dtbook-migrator/

And since it's a new API, we can't just point to the Pipeline 2 API documentation. We need to provide API documentation on that page (or a separate page) as well, along with examples of how to run jobs.

kalaspuffar commented 1 year ago

Hi @josteinaj

I am sending the same information I sent in the last mail here as it might not arrive, and here we have documentation of what has been discussed.

When it comes to documentation, the SwaggerAPI / OpenAPI documentation can be viewed either by downloading the editor or going to https://editor.swagger.io/

You could either download the file and upload it to the editor or import the URL directly. https://raw.githubusercontent.com/nlbdev/nordic-epub3-dtbook-migrator/master/web/docs/api_v1.yaml

Building the docker image should not be more complicated than building any other image:

docker build -t nordic-epub-validator .

And running it only requires exposing the web server port to access the API.

docker run --publish 8080:80 nordic-epub-validator

If there is anything else you need for the documentation then don't hesitate to reach out.

Best regards Daniel

josteinaj commented 1 year ago

Hi Daniel!

Right, I found the e-mail now. It was right before summer vacation and I see I've forgot to reply to it. Sorry.

We need some documentation of the usage in addition to the yaml. Could you write the commands with some comments on how to use it? Say I have an EPUB, how do I post it to the API (with curl or wget example), how do I check the status of the job (if it's asynchronous), and how can I get the results? For basic usage, I don't think we should require users to open swagger or similar.

kalaspuffar commented 1 year ago

Hi @josteinaj

The swagger documentation is following the standard and can be used to produce pretty much what you want.

Open it in the editor and export it as an zip file with html documents or print it as a PDF depending on what you want to present. But the best representation is the live view where you can try the API out.

Best regards Daniel

kalaspuffar commented 1 year ago

Hi @josteinaj

Seems they had removed the print utility, so I've found another site that could generate PDF output for those that don't want the interactive GUI.

nordicvalidator.pdf

Best regards Daniel

josteinaj commented 1 year ago

Hi Daniel.

Thank you.

Could you also write a step-by-step example of how to validate an EPUB from the command line? Using either curl or wget.

This is so that the somewhat-technical users, that are not developers, can use the validator without too much trouble.

Regards Jostein

kalaspuffar commented 1 year ago

Hi @josteinaj

Well docker images with a restish API aren't for command line but I guess the easiest is to run

./createSchemas.sh
mvn package
java -jar target/NordicValidator-[version]-jar-with-dependencies.jar input.epub

If you know how to run curl you probably can build a jar package.

If you require something for none developers we need a WebUI in the image for uploading. But that is not in the current scope of the project

Otherwise I could record a video on how to run it from Postman. Also not in the current scope of the project.

Best regards Daniel

josteinaj commented 1 year ago

POST /v1/Validation/ uploadFilePath string Path to output html report on OneDrive downloadFilePath string Path to epub file stored on OneDrive

I remember we discussed an option that didn't require OneDrive? The default should not be OneDrive. OneDrive is a MTM/Webarch-specific feature.

Could it also be possible to POST an EPUB directly to the API? That would make the API easier to use in many cases. It was possible when we used the Pipeline 2 API.

kalaspuffar commented 1 year ago

Hi @josteinaj

The PDF is a bit harder to read, but there are two options for the same API endpoint. So the /v1/Validation can have either a JSON body or a form-data post.

FORM DATA PARAMETERS

NAME      TYPE                        DESCRIPTION
config    object                      Validation configuration
file      string(binary)              File to upload as a multipart upload

As I said earlier, I could create a video for Postman, a webpage for uploads, or a small client API. But I never done a multipart upload via curl, but if it works, I guess it would look something like this:

curl -F config="{"noEPUBCheck":false,"noACE":false,"schema":"2020-1"}" -F file=@filename.epub http://localhost:8080/v1/Validation

Best regards Daniel

josteinaj commented 1 year ago

Thanks! It seems to work to validate like that :+1:.

So first I start the container in one terminal like this:

docker run --publish 8080:80 nlbdev/nordic-epub3-dtbook-migrator

And then in another terminal, I navigate to the sample EPUB in src/test/resources/2020-1 and run:

curl -s -F config='{"noEPUBCheck":false,"noACE":false,"schema":"2020-1"}' -F file=@X60352A.epub http://localhost:8080/v1/Validation

The response I get is this:

{
  "uploadFilePath": "X60352A.epub",
  "datetime": "2023-09-08 12:50:02",
  "book": "Om det nord-tschudiska språket",
  "schema": "2020-1",
  "report": {
    "issue-count": 0,
    "filename": "X60352A.epub",
    "schema-info": {
      "opf_and_html": {
        "filename": "nordic2020-1.opf-and-html.xsl",
        "description": "Cross-document references and metadata",
        "document-type": "Nordic EPUB3 OPF+HTML"
      },
      "ace": {
        "filename": "",
        "description": "Validating with ACE  1.2.7",
        "document-type": "DAISY Accessibility Checker for EPUB"
      },
      "opf": {
        "filename": "nordic2020-1.opf.xsl",
        "description": "",
        "document-type": "Nordic EPUB3 Package Document"
      },
      "content_files_schema": {
        "filename": "nordic2020-1.xsl",
        "description": "",
        "document-type": "Nordic HTML (EPUB3 Content Document)"
      },
      "epub": {
        "filename": "",
        "description": "General EPUB requirements",
        "document-type": "Nordic EPUB3"
      },
      "nav_ncx": {
        "filename": "nordic2020-1.nav-ncx.xsl",
        "description": "",
        "document-type": "Nordic EPUB3 NCX and Navigation Document"
      },
      "nav_references": {
        "filename": "nordic2020-1.nav-references.xsl",
        "description": "References from the navigation document to the content documents",
        "document-type": "Nordic EPUB3 Navigation Document References"
      },
      "epubcheck": {
        "filename": "",
        "description": "Validating with EPUBCheck  5.0.0",
        "document-type": "EPUBCheck EPUB3"
      },
      "xhtml": {
        "filename": "nordic-html5.rng",
        "description": "",
        "document-type": ""
      }
    },
    "created": "2023-09-08 14:50:07",
    "guideline": "Nordic EPUB Guideline 2020-1",
    "issues": [],
    "status": "SUCCESS"
  }
}

It says "SUCCESS", but the docker container logs an exception, is it anything to worry about?

java.util.concurrent.ExecutionException: java.lang.NoSuchMethodError: 'boolean net.sf.saxon.om.NameChecker.isValidNCName(java.lang.String)'
    at java.base/java.util.concurrent.FutureTask.report(FutureTask.java:122)
    at java.base/java.util.concurrent.FutureTask.get(FutureTask.java:191)
    at org.daisy.validator.EPUBFiles.validate(EPUBFiles.java:351)
    at org.daisy.validator.NordicValidator.main(NordicValidator.java:129)
Caused by: java.lang.NoSuchMethodError: 'boolean net.sf.saxon.om.NameChecker.isValidNCName(java.lang.String)'
    at com.adobe.epubcheck.vocab.PrefixDeclarationParser.parsePrefixMappings(PrefixDeclarationParser.java:105)
    at com.adobe.epubcheck.vocab.VocabUtil.parsePrefixDeclaration(VocabUtil.java:179)
    at com.adobe.epubcheck.opf.OPFHandler30.startElement(OPFHandler30.java:196)
    at com.adobe.epubcheck.xml.handlers.XMLHandler.startElement(XMLHandler.java:115)
    at com.adobe.epubcheck.xml.handlers.DelegateDefaultHandler.startElement(DelegateDefaultHandler.java:170)
    at com.adobe.epubcheck.xml.handlers.WrappingDefaultHandler.startElement(WrappingDefaultHandler.java:95)
    at com.adobe.epubcheck.xml.handlers.PreprocessingDefaultHandler.startElement(PreprocessingDefaultHandler.java:59)
    at org.apache.xerces.parsers.AbstractSAXParser.startElement(Unknown Source)
    at org.apache.xerces.impl.XMLNSDocumentScannerImpl.scanStartElement(Unknown Source)
    at org.apache.xerces.impl.XMLNSDocumentScannerImpl$NSContentDispatcher.scanRootElementHook(Unknown Source)
    at org.apache.xerces.impl.XMLDocumentFragmentScannerImpl$FragmentContentDispatcher.dispatch(Unknown Source)
    at org.apache.xerces.impl.XMLDocumentFragmentScannerImpl.scanDocument(Unknown Source)
    at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source)
    at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source)
    at org.apache.xerces.parsers.XMLParser.parse(Unknown Source)
    at org.apache.xerces.parsers.AbstractSAXParser.parse(Unknown Source)
    at org.apache.xerces.jaxp.SAXParserImpl$JAXPSAXParser.parse(Unknown Source)
    at org.apache.xerces.jaxp.SAXParserImpl.parse(Unknown Source)
    at com.adobe.epubcheck.xml.XMLParser.process(XMLParser.java:176)
    at com.adobe.epubcheck.opf.OPFChecker.checkContent(OPFChecker.java:203)
    at com.adobe.epubcheck.opf.OPFChecker30.checkContent(OPFChecker30.java:79)
    at com.adobe.epubcheck.opf.OPFChecker.checkPackage(OPFChecker.java:111)
    at com.adobe.epubcheck.opf.OPFChecker30.checkPackage(OPFChecker30.java:67)
    at com.adobe.epubcheck.opf.OPFChecker.check(OPFChecker.java:94)
    at com.adobe.epubcheck.ocf.OCFChecker.check(OCFChecker.java:174)
    at com.adobe.epubcheck.api.EpubCheck.doValidate(EpubCheck.java:218)
    at org.daisy.validator.epubcheck.EPUBCheckValidator.call(EPUBCheckValidator.java:24)
    at org.daisy.validator.epubcheck.EPUBCheckValidator.call(EPUBCheckValidator.java:12)
    at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
    at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
    at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
    at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
    at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
    at java.base/java.lang.Thread.run(Thread.java:829)

Postman-video, webpage for uploads and client API:

I've added documentation to the homepage now, with both how to use it as a command line interface, and using the REST API (using curl as an example):

Do you think it looks ok?

kalaspuffar commented 1 year ago

Hi @josteinaj

It looks ok, but I think you have a small typo in the first command nlbdev/nordic-epub3-dtbook-migrato

Best regards Daniel

josteinaj commented 1 year ago
josteinaj commented 1 year ago
~ ❯ docker run --rm -it nlbdev/nordic-epub3-dtbook-migrator bash

root@dbe2d2b76506:/var/www/html# ace
[0920/151038.747077:FATAL:electron_main_delegate.cc(294)] Running as root without --no-sandbox is not supported. See https://crbug.com/638180.
/usr/local/lib/nodejs/node-v16.18.0-linux-x64/lib/node_modules/@daisy/ace/node_modules/electron/dist/electron exited with signal SIGTRAP

root@dbe2d2b76506:/var/www/html# ace --no-sandbox
[30:0920/151043.903579:ERROR:bus.cc(399)] Failed to connect to the bus: Failed to connect to socket /run/dbus/system_bus_socket: No such file or directory
[30:0920/151044.180912:ERROR:ozone_platform_x11.cc(240)] Missing X server or $DISPLAY
[30:0920/151044.180923:ERROR:env.cc(255)] The platform failed to initialize.  Exiting.
The futex facility returned an unexpected error code.
/usr/local/lib/nodejs/node-v16.18.0-linux-x64/lib/node_modules/@daisy/ace/node_modules/electron/dist/electron exited with signal SIGABRT

@kalaspuffar how do I validate with ace?

kalaspuffar commented 1 year ago

Hi @josteinaj

I'm not sure what is going wrong there. But this Docker image has been tested, and we have gotten Ace results. Perhaps the ace engine has been updated since last we tested ?

Looking at the class for Ace in the client, there are no special flags.

https://github.com/nlbdev/nordic-epub3-dtbook-migrator/blob/master/client/src/main/java/org/daisy/validator/ace/ACEValidator.java

Best regards Daniel

josteinaj commented 1 year ago

It seems that the latest working version of @daisy/ace is 1.2.7, so I downgraded to that one.

josteinaj commented 1 year ago

@kalaspuffar I tagged a v2.0.0 version, and it's building on docker hub now. Could you verify that it works for you?

kalaspuffar commented 1 year ago

Hi @josteinaj

I've searched on DockerHub, and I can't see it at all. If I search for "nordic-epub3" I can find a sbsdev released 3 years back but not your version.

I tried to log in as well but could not find it either. Is it a private repository?

Best regards Daniel

kalaspuffar commented 1 year ago

Hi @josteinaj

We have also looked into the Ace repository, and version 1.2.8 should work just fine.

But as of version 1.3.0 they have deprecated Puppeteer as a main driver for the Axe plugin validation. Puppeteer is a tool to run Chrome in a headless mode and is good if you want to run it in a docker container, for instance.

I don't know if Chrome has also deprecated Puppeteer, and that could be the leading factor in this change. But Ace is now using a pure Electron implementation to create nodejs interfaces by starting a slimmed-down version of Chrome on your desktop.

Because it will open a window, it will not work inside a docker container without a graphical interface.

Best regards Daniel

josteinaj commented 1 year ago

Hi @kalaspuffar!

We recently went through and made stuff private, and this must've been made private by mistake.

Could you try again now? Now it should be public.

kalaspuffar commented 1 year ago

Hi @josteinaj

It seems to work just fine now that I have access.

Best regards Daniel