nlbdev / nordic-epub3-dtbook-migrator

Tools for converting between a strict subset of DTBook and EPUB3.
http://nlbdev.github.io/nordic-epub3-dtbook-migrator/
GNU Lesser General Public License v2.1
8 stars 7 forks source link

Make sure that DTBook XML encoding value is UTF-8 #318

Open TamJ opened 9 years ago

TamJ commented 9 years ago

It looks like the 'us-ascii' value is now being set as the xml encoding value. Maybe a bug from the latest xslt/xproc updates??

josteinaj commented 9 years ago

It stores it initially using us-ascii so that all non-ascii characters are hex encoded; then it should set the encoding in the xml declaration to utf-8. Is there any errors or warnings in the logs?

TamJ commented 9 years ago

No. So far I can't find anything in the logs. I've tried even to validate a dtbook with 'us-ascii' encoding using the Nordic script and no errors or warnings are shown. So it seems the files are correctly encoded but the us-ascii value still present in the encoding attr. I'll send over an example epub in a moment ...

josteinaj commented 9 years ago

ok, thanks.

josteinaj commented 9 years ago

I'm not able to reproduce this.

@TamJ: Which build of the migrator were you using?

EdmarS commented 9 years ago

We experience still the same issue in build 314. Al html files in the epub3 are using encoding utf-8, but the generated DTBook output is in us-ascii encoding.

josteinaj commented 9 years ago

I was not able to reproduce this earlier.

@EdmarS: I will send you login info to our test-server. See if you can reproduce it there.

josteinaj commented 9 years ago

Ok, we need to determine in what environment and for which books this issue occurs so that we can reproduce it. Here's some more detailed steps to collect information about the environment from a Windows install:

  1. make sure Pipeline 2 is running, and keep it running for the rest of these steps
  2. run a job that fails to set the correct encoding, download the results as well as the detailed log file to your desktop, also; if you don't use default options when running the job, make a note somewhere about what values were used so that it can be reproduced later
  3. go to http://localhost:9000/log and save the log to your desktop
  4. download and run dp2env.bat - I wrote this up today and haven't tested it thoroughly but it works in my Windows 7 VM at least. It will collect and store to a text file the following (shouldn't be anything too sensitive in this info, but don't post it to github, send it to me by e-mail!):
    • locale
    • cpu architecture
    • memory
    • windows version
    • java version
    • a list of all environment variables
    • a list of all running processes (only need the pipeline stuff but there's no grep in windows so all running processes will be included)
    • user permissions for the Pipeline 2 installation folder and application data folder
  5. you should now have three log files, the input file as well as the output file on your desktop, attach those to an e-mail addressed to me (or if the files are too big we'll find another way)
  6. some other info if relevant (I may have asked this before, but just to be sure...):
    • does it happen to all books or just some books in particular?
      • any idea what the difference can be with those books that do not work?
      • do they come from a particular supplier?
      • are they produced in a different way than the other books?
      • do they contain any special content that would distinguish them from other books?
      • are the books particularly large, or contain large files?
    • does it happen sporadically or does it always happen to certain books?
    • does it happen only periodically, i.e. some days it works, some days it doesn't?
    • was pipeline 2 installed using the normal windows installer?
    • does a reinstall help?
    • do you have administrator privileges to your computer?
    • was pipeline 2 installed using the same user as the user that is using it? if not; do you know if the user that installed it has administrator privileges?
    • are you running Pipeline 2 as a normal user but with administrator privileges?
josteinaj commented 9 years ago

Anders (SPSM) was able to provide a log containing an exception. I would still like to get log files from others who experience this problem so that I can compare the environments, but in any case this at least shows us where in the code the problem lies:

2015-08-18 08:04:06,927 [ERROR] com.xmlcalabash.library.DefaultStep - px:set-xml-declaration failed to read from C:\Users\Admin\AppData\Roaming\DAISY Pipeline 2\jobs\3efa03c6-e97b-4f9b-8964-fa39331fd4a2\output\output-dir\X40089A\X40089A.xml
java.nio.file.FileSystemException: C:\Users\Admin\AppData\Roaming\DAISY Pipeline 2\jobs\3efa03c6-e97b-4f9b-8964-fa39331fd4a2\output\output-dir\X40089A\X40089A.xml: The process cannot access the file because it is being used by another process.

    at sun.nio.fs.WindowsException.translateToIOException(Unknown Source) ~[na:1.8.0_31]
    at sun.nio.fs.WindowsException.rethrowAsIOException(Unknown Source) ~[na:1.8.0_31]
    at sun.nio.fs.WindowsException.rethrowAsIOException(Unknown Source) ~[na:1.8.0_31]
    at sun.nio.fs.WindowsFileCopy.move(Unknown Source) ~[na:1.8.0_31]
    at sun.nio.fs.WindowsFileSystemProvider.move(Unknown Source) ~[na:1.8.0_31]
    at java.nio.file.Files.move(Unknown Source) ~[na:1.8.0_31]
    at org.daisy.common.xproc.calabash.steps.SetXmlDeclarationProvider$SetXmlDeclaration.setXmlDeclaration(SetXmlDeclarationProvider.java:119) ~[na:na]
    at org.daisy.common.xproc.calabash.steps.SetXmlDeclarationProvider$SetXmlDeclaration.run(SetXmlDeclarationProvider.java:75) ~[na:na]
    at com.xmlcalabash.runtime.XAtomicStep.run(Unknown Source) ~[na:na]
    at com.xmlcalabash.runtime.XCompoundStep.run(Unknown Source) ~[na:na]
    at com.xmlcalabash.runtime.XChoose.run(Unknown Source) ~[na:na]
    at com.xmlcalabash.runtime.XPipeline.doRun(Unknown Source) ~[na:na]
    at com.xmlcalabash.runtime.XPipeline.run(Unknown Source) ~[na:na]
    at com.xmlcalabash.runtime.XPipelineCall.run(Unknown Source) ~[na:na]
    at com.xmlcalabash.runtime.XViewport.processStartElement(Unknown Source) ~[na:na]
    at com.xmlcalabash.util.ProcessMatch.traverse(Unknown Source) ~[na:na]
    at com.xmlcalabash.util.ProcessMatch.traverse(Unknown Source) ~[na:na]
    at com.xmlcalabash.util.ProcessMatch.traverse(Unknown Source) ~[na:na]
    at com.xmlcalabash.util.ProcessMatch.match(Unknown Source) ~[na:na]
    at com.xmlcalabash.runtime.XViewport.run(Unknown Source) ~[na:na]
    at com.xmlcalabash.runtime.XCompoundStep.run(Unknown Source) ~[na:na]
    at com.xmlcalabash.runtime.XChoose.run(Unknown Source) ~[na:na]
    at com.xmlcalabash.runtime.XPipeline.doRun(Unknown Source) ~[na:na]
    at com.xmlcalabash.runtime.XPipeline.run(Unknown Source) ~[na:na]
    at com.xmlcalabash.runtime.XPipelineCall.run(Unknown Source) ~[na:na]
    at com.xmlcalabash.runtime.XPipeline.doRun(Unknown Source) ~[na:na]
    at com.xmlcalabash.runtime.XPipeline.run(Unknown Source) ~[na:na]
    at org.daisy.common.xproc.calabash.impl.CalabashXProcPipeline.run(CalabashXProcPipeline.java:242) ~[na:na]
    at org.daisy.pipeline.job.Job.run(Job.java:216) ~[na:na]
    at org.daisy.pipeline.job.impl.DefaultJobExecutionService$1.run(DefaultJobExecutionService.java:110) ~[na:na]
    at java.lang.Thread.run(Unknown Source) ~[na:1.8.0_31]

So it attempts to change the encoding but fails because the file is already in use. It is unclear why this happens, so more debugging info (including answers to the questions I asked in the debugging instructions) is much appreciated.

josteinaj commented 6 years ago

Unfortunately not fixed by v1.2.0. It's still the same exception in the logs as previously reported.

josteinaj commented 4 years ago

Reported again today by Martin (MTM).

We could possibly add a boolean option called for instance "hex-encode-non-ascii-characters", with a default value of true to preserve the current default behavior. By setting it to false, we could store directly using utf-8, and avoid the whole race condition (I think).