Closed nschneid closed 1 year ago
The XML standard does not allow certain characters to part of the XML document. While the XML 1.1 standard allows more than the XML 1.0 standard, there are still some forbidden characters even in XML 1.1.
Thanks. The thing is, I can export the individual documents I have annotated to the format. I just cannot export the entire project. Any idea why this might be?
Can you provide the part of the log output that contains the stack trace any maybe a few lines before?
2023-06-08 15:36:22 INFO [SYSTEM] DocumentImportExportServiceImpl - Exported annotations [12628561_ootc_sotomayor.txt](2) for user [admin] from project [CuRIAM Agreement Study](0) using format [xmi]
2023-06-08 15:36:22 INFO [SYSTEM] AnnotationDocumentExporter - Exported annotation document content for user [admin] for source document [12628561_ootc_sotomayor.txt](2) in project [CuRIAM Agreement Study](0)
2023-06-08 15:36:22 ERROR [SYSTEM] BackupProjectExportTask - Unexpected error during project export
de.tudarmstadt.ukp.clarin.webanno.api.export.ProjectExportException: Project export failed
at de.tudarmstadt.ukp.inception.project.export.ProjectExportServiceImpl.exportProjectToPath(ProjectExportServiceImpl.java:277) ~[inception-project-export-28.1.jar!/:?]
at de.tudarmstadt.ukp.inception.project.export.ProjectExportServiceImpl.exportProject(ProjectExportServiceImpl.java:206) ~[inception-project-export-28.1.jar!/:?]
at de.tudarmstadt.ukp.inception.project.export.ProjectExportServiceImpl.exportProject(ProjectExportServiceImpl.java:181) ~[inception-project-export-28.1.jar!/:?]
at de.tudarmstadt.ukp.inception.project.export.ProjectExportServiceImpl$$FastClassBySpringCGLIB$$fe9018a4.invoke(<generated>) ~[inception-project-export-28.1.jar!/:?]
at org.springframework.cglib.proxy.MethodProxy.invoke(MethodProxy.java:218) ~[spring-core-5.3.27.jar!/:5.3.27]
at org.springframework.aop.framework.CglibAopProxy$CglibMethodInvocation.invokeJoinpoint(CglibAopProxy.java:793) ~[spring-aop-5.3.27.jar!/:5.3.27]
at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:163) ~[spring-aop-5.3.27.jar!/:5.3.27]
at org.springframework.aop.framework.CglibAopProxy$CglibMethodInvocation.proceed(CglibAopProxy.java:763) ~[spring-aop-5.3.27.jar!/:5.3.27]
at org.springframework.transaction.interceptor.TransactionInterceptor$1.proceedWithInvocation(TransactionInterceptor.java:123) ~[spring-tx-5.3.27.jar!/:5.3.27]
at org.springframework.transaction.interceptor.TransactionAspectSupport.invokeWithinTransaction(TransactionAspectSupport.java:388) ~[spring-tx-5.3.27.jar!/:5.3.27]
at org.springframework.transaction.interceptor.TransactionInterceptor.invoke(TransactionInterceptor.java:119) ~[spring-tx-5.3.27.jar!/:5.3.27]
at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:186) ~[spring-aop-5.3.27.jar!/:5.3.27]
at org.springframework.aop.framework.CglibAopProxy$CglibMethodInvocation.proceed(CglibAopProxy.java:763) ~[spring-aop-5.3.27.jar!/:5.3.27]
at org.springframework.aop.framework.CglibAopProxy$DynamicAdvisedInterceptor.intercept(CglibAopProxy.java:708) ~[spring-aop-5.3.27.jar!/:5.3.27]
at de.tudarmstadt.ukp.inception.project.export.ProjectExportServiceImpl$$EnhancerBySpringCGLIB$$3c74a6e7.exportProject(<generated>) ~[inception-project-export-28.1.jar!/:?]
at de.tudarmstadt.ukp.inception.project.export.task.backup.BackupProjectExportTask.export(BackupProjectExportTask.java:45) ~[inception-project-export-28.1.jar!/:?]
at de.tudarmstadt.ukp.inception.project.export.task.backup.BackupProjectExportTask.export(BackupProjectExportTask.java:31) ~[inception-project-export-28.1.jar!/:?]
at de.tudarmstadt.ukp.inception.project.export.task.ProjectExportTask_ImplBase.run(ProjectExportTask_ImplBase.java:103) [inception-project-export-28.1.jar!/:?]
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515) [?:?]
at java.util.concurrent.FutureTask.run(FutureTask.java:264) [?:?]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) [?:?]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) [?:?]
at java.lang.Thread.run(Thread.java:829) [?:?]
Caused by: org.apache.uima.analysis_engine.AnalysisEngineProcessException
at org.dkpro.core.io.xmi.XmiWriter.process(XmiWriter.java:133) ~[dkpro-core-io-xmi-asl-2.3.1.jar!/:?]
at org.apache.uima.analysis_component.JCasAnnotator_ImplBase.process(JCasAnnotator_ImplBase.java:50) ~[uimaj-core-3.4.1.jar!/:?]
at org.apache.uima.analysis_engine.impl.AnalysisEngineImplBase.lambda$callProcessMethod$3(AnalysisEngineImplBase.java:669) ~[uimaj-core-3.4.1.jar!/:?]
at org.apache.uima.analysis_engine.impl.AnalysisEngineImplBase.withContexts(AnalysisEngineImplBase.java:688) ~[uimaj-core-3.4.1.jar!/:?]
at org.apache.uima.analysis_engine.impl.AnalysisEngineImplBase.callProcessMethod(AnalysisEngineImplBase.java:668) ~[uimaj-core-3.4.1.jar!/:?]
at org.apache.uima.analysis_engine.impl.PrimitiveAnalysisEngine_impl.callAnalysisComponentProcess(PrimitiveAnalysisEngine_impl.java:387) ~[uimaj-core-3.4.1.jar!/:?]
at org.apache.uima.analysis_engine.impl.PrimitiveAnalysisEngine_impl.processAndOutputNewCASes(PrimitiveAnalysisEngine_impl.java:299) ~[uimaj-core-3.4.1.jar!/:?]
at org.apache.uima.analysis_engine.impl.AnalysisEngineImplBase.process(AnalysisEngineImplBase.java:295) ~[uimaj-core-3.4.1.jar!/:?]
at de.tudarmstadt.ukp.clarin.webanno.api.format.FormatSupport.write(FormatSupport.java:222) ~[inception-api-formats-28.1.jar!/:?]
at de.tudarmstadt.ukp.inception.export.DocumentImportExportServiceImpl.exportCasToFile(DocumentImportExportServiceImpl.java:572) ~[inception-export-28.1.jar!/:?]
at de.tudarmstadt.ukp.inception.export.DocumentImportExportServiceImpl.exportAnnotationDocument(DocumentImportExportServiceImpl.java:269) ~[inception-export-28.1.jar!/:?]
at de.tudarmstadt.ukp.inception.export.DocumentImportExportServiceImpl$$FastClassBySpringCGLIB$$6bf689d0.invoke(<generated>) ~[inception-export-28.1.jar!/:?]
at org.springframework.cglib.proxy.MethodProxy.invoke(MethodProxy.java:218) ~[spring-core-5.3.27.jar!/:5.3.27]
at org.springframework.aop.framework.CglibAopProxy$CglibMethodInvocation.invokeJoinpoint(CglibAopProxy.java:793) ~[spring-aop-5.3.27.jar!/:5.3.27]
at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:163) ~[spring-aop-5.3.27.jar!/:5.3.27]
at org.springframework.aop.framework.CglibAopProxy$CglibMethodInvocation.proceed(CglibAopProxy.java:763) ~[spring-aop-5.3.27.jar!/:5.3.27]
at org.springframework.transaction.interceptor.TransactionInterceptor$1.proceedWithInvocation(TransactionInterceptor.java:123) ~[spring-tx-5.3.27.jar!/:5.3.27]
at org.springframework.transaction.interceptor.TransactionAspectSupport.invokeWithinTransaction(TransactionAspectSupport.java:388) ~[spring-tx-5.3.27.jar!/:5.3.27]
at org.springframework.transaction.interceptor.TransactionInterceptor.invoke(TransactionInterceptor.java:119) ~[spring-tx-5.3.27.jar!/:5.3.27]
at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:186) ~[spring-aop-5.3.27.jar!/:5.3.27]
at org.springframework.aop.framework.CglibAopProxy$CglibMethodInvocation.proceed(CglibAopProxy.java:763) ~[spring-aop-5.3.27.jar!/:5.3.27]
at org.springframework.aop.framework.CglibAopProxy$DynamicAdvisedInterceptor.intercept(CglibAopProxy.java:708) ~[spring-aop-5.3.27.jar!/:5.3.27]
at de.tudarmstadt.ukp.inception.export.DocumentImportExportServiceImpl$$EnhancerBySpringCGLIB$$1a7215eb.exportAnnotationDocument(<generated>) ~[inception-export-28.1.jar!/:?]
at de.tudarmstadt.ukp.inception.schema.exporters.AnnotationDocumentExporter.exportAdditionalFormat(AnnotationDocumentExporter.java:301) ~[inception-schema-28.1.jar!/:?]
at de.tudarmstadt.ukp.inception.schema.exporters.AnnotationDocumentExporter.exportAnnotationDocumentContents(AnnotationDocumentExporter.java:239) ~[inception-schema-28.1.jar!/:?]
at de.tudarmstadt.ukp.inception.schema.exporters.AnnotationDocumentExporter.exportData(AnnotationDocumentExporter.java:140) ~[inception-schema-28.1.jar!/:?]
at de.tudarmstadt.ukp.inception.project.export.ProjectExportServiceImpl.exportProjectToPath(ProjectExportServiceImpl.java:258) ~[inception-project-export-28.1.jar!/:?]
... 22 more
Caused by: org.xml.sax.SAXParseException: Trying to serialize non-XML 1.0 character: 0x3 at offset 2 in string starting with PK
at org.apache.uima.util.XMLSerializer$CharacterValidatingContentHandler.checkForInvalidXmlChars(XMLSerializer.java:429) ~[uimaj-core-3.4.1.jar!/:?]
at org.apache.uima.util.XMLSerializer$CharacterValidatingContentHandler.startElement(XMLSerializer.java:297) ~[uimaj-core-3.4.1.jar!/:?]
at org.apache.uima.cas.impl.XmiCasSerializer$XmiDocSerializer.startElement(XmiCasSerializer.java:1312) ~[uimaj-core-3.4.1.jar!/:?]
at org.apache.uima.cas.impl.XmiCasSerializer$XmiDocSerializer.writeFsOrLists(XmiCasSerializer.java:816) ~[uimaj-core-3.4.1.jar!/:?]
at org.apache.uima.cas.impl.XmiCasSerializer$XmiDocSerializer.writeFs(XmiCasSerializer.java:802) ~[uimaj-core-3.4.1.jar!/:?]
at org.apache.uima.cas.impl.CasSerializerSupport$CasDocSerializer.encodeFS(CasSerializerSupport.java:1312) ~[uimaj-core-3.4.1.jar!/:?]
at org.apache.uima.cas.impl.CasSerializerSupport$CasDocSerializer.encodeQueued(CasSerializerSupport.java:1208) ~[uimaj-core-3.4.1.jar!/:?]
at org.apache.uima.cas.impl.XmiCasSerializer$XmiDocSerializer.writeFeatureStructures(XmiCasSerializer.java:661) ~[uimaj-core-3.4.1.jar!/:?]
at org.apache.uima.cas.impl.CasSerializerSupport$CasDocSerializer.serialize(CasSerializerSupport.java:563) ~[uimaj-core-3.4.1.jar!/:?]
at org.apache.uima.cas.impl.XmiCasSerializer.serialize(XmiCasSerializer.java:506) ~[uimaj-core-3.4.1.jar!/:?]
at org.dkpro.core.io.xmi.XmiWriter.process(XmiWriter.java:124) ~[dkpro-core-io-xmi-asl-2.3.1.jar!/:?]
at org.apache.uima.analysis_component.JCasAnnotator_ImplBase.process(JCasAnnotator_ImplBase.java:50) ~[uimaj-core-3.4.1.jar!/:?]
at org.apache.uima.analysis_engine.impl.AnalysisEngineImplBase.lambda$callProcessMethod$3(AnalysisEngineImplBase.java:669) ~[uimaj-core-3.4.1.jar!/:?]
at org.apache.uima.analysis_engine.impl.AnalysisEngineImplBase.withContexts(AnalysisEngineImplBase.java:688) ~[uimaj-core-3.4.1.jar!/:?]
at org.apache.uima.analysis_engine.impl.AnalysisEngineImplBase.callProcessMethod(AnalysisEngineImplBase.java:668) ~[uimaj-core-3.4.1.jar!/:?]
at org.apache.uima.analysis_engine.impl.PrimitiveAnalysisEngine_impl.callAnalysisComponentProcess(PrimitiveAnalysisEngine_impl.java:387) ~[uimaj-core-3.4.1.jar!/:?]
at org.apache.uima.analysis_engine.impl.PrimitiveAnalysisEngine_impl.processAndOutputNewCASes(PrimitiveAnalysisEngine_impl.java:299) ~[uimaj-core-3.4.1.jar!/:?]
at org.apache.uima.analysis_engine.impl.AnalysisEngineImplBase.process(AnalysisEngineImplBase.java:295) ~[uimaj-core-3.4.1.jar!/:?]
at de.tudarmstadt.ukp.clarin.webanno.api.format.FormatSupport.write(FormatSupport.java:222) ~[inception-api-formats-28.1.jar!/:?]
at de.tudarmstadt.ukp.inception.export.DocumentImportExportServiceImpl.exportCasToFile(DocumentImportExportServiceImpl.java:572) ~[inception-export-28.1.jar!/:?]
at de.tudarmstadt.ukp.inception.export.DocumentImportExportServiceImpl.exportAnnotationDocument(DocumentImportExportServiceImpl.java:269) ~[inception-export-28.1.jar!/:?]
at de.tudarmstadt.ukp.inception.export.DocumentImportExportServiceImpl$$FastClassBySpringCGLIB$$6bf689d0.invoke(<generated>) ~[inception-export-28.1.jar!/:?]
at org.springframework.cglib.proxy.MethodProxy.invoke(MethodProxy.java:218) ~[spring-core-5.3.27.jar!/:5.3.27]
at org.springframework.aop.framework.CglibAopProxy$CglibMethodInvocation.invokeJoinpoint(CglibAopProxy.java:793) ~[spring-aop-5.3.27.jar!/:5.3.27]
at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:163) ~[spring-aop-5.3.27.jar!/:5.3.27]
at org.springframework.aop.framework.CglibAopProxy$CglibMethodInvocation.proceed(CglibAopProxy.java:763) ~[spring-aop-5.3.27.jar!/:5.3.27]
at org.springframework.transaction.interceptor.TransactionInterceptor$1.proceedWithInvocation(TransactionInterceptor.java:123) ~[spring-tx-5.3.27.jar!/:5.3.27]
at org.springframework.transaction.interceptor.TransactionAspectSupport.invokeWithinTransaction(TransactionAspectSupport.java:388) ~[spring-tx-5.3.27.jar!/:5.3.27]
at org.springframework.transaction.interceptor.TransactionInterceptor.invoke(TransactionInterceptor.java:119) ~[spring-tx-5.3.27.jar!/:5.3.27]
at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:186) ~[spring-aop-5.3.27.jar!/:5.3.27]
at org.springframework.aop.framework.CglibAopProxy$CglibMethodInvocation.proceed(CglibAopProxy.java:763) ~[spring-aop-5.3.27.jar!/:5.3.27]
at org.springframework.aop.framework.CglibAopProxy$DynamicAdvisedInterceptor.intercept(CglibAopProxy.java:708) ~[spring-aop-5.3.27.jar!/:5.3.27]
at de.tudarmstadt.ukp.inception.export.DocumentImportExportServiceImpl$$EnhancerBySpringCGLIB$$1a7215eb.exportAnnotationDocument(<generated>) ~[inception-export-28.1.jar!/:?]
at de.tudarmstadt.ukp.inception.schema.exporters.AnnotationDocumentExporter.exportAdditionalFormat(AnnotationDocumentExporter.java:301) ~[inception-schema-28.1.jar!/:?]
at de.tudarmstadt.ukp.inception.schema.exporters.AnnotationDocumentExporter.exportAnnotationDocumentContents(AnnotationDocumentExporter.java:239) ~[inception-schema-28.1.jar!/:?]
at de.tudarmstadt.ukp.inception.schema.exporters.AnnotationDocumentExporter.exportData(AnnotationDocumentExporter.java:140) ~[inception-schema-28.1.jar!/:?]
at de.tudarmstadt.ukp.inception.project.export.ProjectExportServiceImpl.exportProjectToPath(ProjectExportServiceImpl.java:258) ~[inception-project-export-28.1.jar!/:?]
... 22 more
If you export the file 12628561_ootc_sotomayor.txt
individually e.g. from the annotation page as UIMA CAS XMI (XML 1.0), you should see the same error.
If you download the file as a plain text file and open it in a hex editor, you should see that the third byte in the data is control character 0x03
.
I'm able to export that file just fine in either XML 1.0 or 1.1:
2023-06-09 20:23:50 INFO [SYSTEM] DocumentImportExportServiceImpl - Exported annotations [12628561_ootc_sotomayor.txt](2) for user [admin] from project [CuRIAM Agreement Study](0) using format [xmi]
2023-06-09 20:24:36 INFO [SYSTEM] DocumentImportExportServiceImpl - Exported annotations [12628561_ootc_sotomayor.txt](2) for user [admin] from project [CuRIAM Agreement Study](0) using format [xmi-xml1.1]
That is very interesting since the code used to export the document should be the same in both instances. I wonder if you could share a project export privately with me for investigation? (Exported using "no secondary format").
Btw. does the document text actually start with PK
@reckart Sent you the file.
I'm not sure where the PK
comes from—wondering if it means "primary key".
The PK
comes from the header of a ZIP file. The project contains a 12628561_ootc_sotomayor.zip
document in addition to the 12628561_ootc_sotomayor.txt
. The error you see is generated when INCEpTION tries to export this ZIP file into a CAS because ZIP files are binary files and typically contain characters which are not legal XML 1.0/1.1 characters.
Removing the ZIP file from your documents lists fixes the problem.
Interesting. The ZIP file was an export of a file that I reimported to try to test the curation mode. Do you know why the import didn't properly unpack the ZIP file?
If you export a project as ZIP, you need to import that project through the project overview page again, not as a document.
If you export a document as XMI, it comes down as a ZIP too - but you cannot import that ZIP back in directly. For uploading an XMI file, you'd have to unzip the file and only upload the .xmi
file. Also, you'd have to choose the proper input format - which in your case was Plain text
for the ZIP file and not CAS XMI. If you had chosen CAS XMI
while importing the ZIP, INCEpTION would directly have issued an error because a ZIP file cannot be read as a CAS XMI
file.
Describe the bug
Project backup (xmi-xml1.1) Unexpected error during project export: SAXParseException: Trying to serialize non-XML 1.1 character: 0x0 at offset 5 in string starting with PK
To Reproduce
Project Settings > Export > Backup export with Secondary format: UIMA CAS XMI (XML 1.1)
Expected behavior
No response
Screenshots
No response
Environment
Version and build ID: INCEpTION -- 28.1 (2023-05-26 16:54:12, build 867bcf14) Operating system: macOS 13.3.1 (a) Java: openjdk version "11.0.19" 2023-04-18 Browser: Firefox 114.0
Additional context
CAS Doctor doesn't show anything suspicious.