Open TamJ opened 9 years ago
It stores it initially using us-ascii so that all non-ascii characters are hex encoded; then it should set the encoding in the xml declaration to utf-8. Is there any errors or warnings in the logs?
No. So far I can't find anything in the logs. I've tried even to validate a dtbook with 'us-ascii' encoding using the Nordic script and no errors or warnings are shown. So it seems the files are correctly encoded but the us-ascii value still present in the encoding attr. I'll send over an example epub in a moment ...
ok, thanks.
I'm not able to reproduce this.
@TamJ: Which build of the migrator were you using?
We experience still the same issue in build 314. Al html files in the epub3 are using encoding utf-8, but the generated DTBook output is in us-ascii encoding.
I was not able to reproduce this earlier.
@EdmarS: I will send you login info to our test-server. See if you can reproduce it there.
Ok, we need to determine in what environment and for which books this issue occurs so that we can reproduce it. Here's some more detailed steps to collect information about the environment from a Windows install:
grep
in windows so all running processes will be included)Anders (SPSM) was able to provide a log containing an exception. I would still like to get log files from others who experience this problem so that I can compare the environments, but in any case this at least shows us where in the code the problem lies:
2015-08-18 08:04:06,927 [ERROR] com.xmlcalabash.library.DefaultStep - px:set-xml-declaration failed to read from C:\Users\Admin\AppData\Roaming\DAISY Pipeline 2\jobs\3efa03c6-e97b-4f9b-8964-fa39331fd4a2\output\output-dir\X40089A\X40089A.xml
java.nio.file.FileSystemException: C:\Users\Admin\AppData\Roaming\DAISY Pipeline 2\jobs\3efa03c6-e97b-4f9b-8964-fa39331fd4a2\output\output-dir\X40089A\X40089A.xml: The process cannot access the file because it is being used by another process.
at sun.nio.fs.WindowsException.translateToIOException(Unknown Source) ~[na:1.8.0_31]
at sun.nio.fs.WindowsException.rethrowAsIOException(Unknown Source) ~[na:1.8.0_31]
at sun.nio.fs.WindowsException.rethrowAsIOException(Unknown Source) ~[na:1.8.0_31]
at sun.nio.fs.WindowsFileCopy.move(Unknown Source) ~[na:1.8.0_31]
at sun.nio.fs.WindowsFileSystemProvider.move(Unknown Source) ~[na:1.8.0_31]
at java.nio.file.Files.move(Unknown Source) ~[na:1.8.0_31]
at org.daisy.common.xproc.calabash.steps.SetXmlDeclarationProvider$SetXmlDeclaration.setXmlDeclaration(SetXmlDeclarationProvider.java:119) ~[na:na]
at org.daisy.common.xproc.calabash.steps.SetXmlDeclarationProvider$SetXmlDeclaration.run(SetXmlDeclarationProvider.java:75) ~[na:na]
at com.xmlcalabash.runtime.XAtomicStep.run(Unknown Source) ~[na:na]
at com.xmlcalabash.runtime.XCompoundStep.run(Unknown Source) ~[na:na]
at com.xmlcalabash.runtime.XChoose.run(Unknown Source) ~[na:na]
at com.xmlcalabash.runtime.XPipeline.doRun(Unknown Source) ~[na:na]
at com.xmlcalabash.runtime.XPipeline.run(Unknown Source) ~[na:na]
at com.xmlcalabash.runtime.XPipelineCall.run(Unknown Source) ~[na:na]
at com.xmlcalabash.runtime.XViewport.processStartElement(Unknown Source) ~[na:na]
at com.xmlcalabash.util.ProcessMatch.traverse(Unknown Source) ~[na:na]
at com.xmlcalabash.util.ProcessMatch.traverse(Unknown Source) ~[na:na]
at com.xmlcalabash.util.ProcessMatch.traverse(Unknown Source) ~[na:na]
at com.xmlcalabash.util.ProcessMatch.match(Unknown Source) ~[na:na]
at com.xmlcalabash.runtime.XViewport.run(Unknown Source) ~[na:na]
at com.xmlcalabash.runtime.XCompoundStep.run(Unknown Source) ~[na:na]
at com.xmlcalabash.runtime.XChoose.run(Unknown Source) ~[na:na]
at com.xmlcalabash.runtime.XPipeline.doRun(Unknown Source) ~[na:na]
at com.xmlcalabash.runtime.XPipeline.run(Unknown Source) ~[na:na]
at com.xmlcalabash.runtime.XPipelineCall.run(Unknown Source) ~[na:na]
at com.xmlcalabash.runtime.XPipeline.doRun(Unknown Source) ~[na:na]
at com.xmlcalabash.runtime.XPipeline.run(Unknown Source) ~[na:na]
at org.daisy.common.xproc.calabash.impl.CalabashXProcPipeline.run(CalabashXProcPipeline.java:242) ~[na:na]
at org.daisy.pipeline.job.Job.run(Job.java:216) ~[na:na]
at org.daisy.pipeline.job.impl.DefaultJobExecutionService$1.run(DefaultJobExecutionService.java:110) ~[na:na]
at java.lang.Thread.run(Unknown Source) ~[na:1.8.0_31]
So it attempts to change the encoding but fails because the file is already in use. It is unclear why this happens, so more debugging info (including answers to the questions I asked in the debugging instructions) is much appreciated.
Unfortunately not fixed by v1.2.0. It's still the same exception in the logs as previously reported.
Reported again today by Martin (MTM).
We could possibly add a boolean option called for instance "hex-encode-non-ascii-characters
", with a default value of true
to preserve the current default behavior. By setting it to false
, we could store directly using utf-8, and avoid the whole race condition (I think).
It looks like the 'us-ascii' value is now being set as the xml encoding value. Maybe a bug from the latest xslt/xproc updates??