Closed fabiod20 closed 2 years ago
Can you give us an example of a Conll file that causes this error?
That's interesting because conll 2003 isn't even an XML format
Sure. I tried with this test file, both with ".txt" and ".conll" extensions. test.txt
If I try importing the sample CoNLL 2003 data from our documentation, it works:
U.N. NNP I-NP I-ORG
official NN I-NP O
Ekeus NNP I-NP I-PER
heads VBZ I-VP O
for IN I-PP O
Baghdad NNP I-NP I-LOC
. . O O
And if I try to import you test file, I get this error:
... that is locally in my dev environment ... if I try it on a server installation, I also get the Error while uploading document test.txt: ClassNotFoundException: org.codehaus.plexus.util.xml.pull.XmlPullParserException
error - funny.
The issue is that INCEpTION only supports the strict Conll 2003 file format which needs 3 labels per token (POS, NER, CHUNK), see here for an example. Another issue is that Conll 2003 uses IOB and you use BIO. I added an issue to create the file format that you and many others want to import, but it will take a while.
The best way I see for you to go forward is to either adhere to the Conll 2003 format we support or use Conll-U, e.g. via https://pypi.org/project/conllu/ .
The error message is strange though and needs investigation.
I have cleaned up dependencies a bit for 22.0 ... looks like maybe a bit too eager.
Jan 13 20:46:23 blinky inception-stable.jar[4152]: 2022-01-13 20:46:23 ERROR [rec] ImportDocumentsPanel - test.txt: org/codehaus/plexus/util/xml/pull/XmlPullParserException
Jan 13 20:46:23 blinky inception-stable.jar[4152]: java.lang.NoClassDefFoundError: org/codehaus/plexus/util/xml/pull/XmlPullParserException
Jan 13 20:46:23 blinky inception-stable.jar[4152]: at org.dkpro.core.api.resources.MappingProviderFactory.createPosMappingProvider(MappingProviderFactory.java:46) ~[dkpro-core-api-resources-asl-2.2.0.jar!/:?]
Jan 13 20:46:23 blinky inception-stable.jar[4152]: at org.dkpro.core.io.conll.Conll2003Reader.initialize(Conll2003Reader.java:174) ~[dkpro-core-io-conll-asl-2.2.0.jar!/:?]
Jan 13 20:46:23 blinky inception-stable.jar[4152]: at org.apache.uima.fit.component.CasCollectionReader_ImplBase.initialize(CasCollectionReader_ImplBase.java:41) ~[uimafit-core-3.2.0.jar!/:3.2.0]
Jan 13 20:46:23 blinky inception-stable.jar[4152]: at org.apache.uima.collection.CollectionReader_ImplBase.initialize(CollectionReader_ImplBase.java:67) ~[uimaj-core-3.2.0.jar!/:3.2.0]
Jan 13 20:46:23 blinky inception-stable.jar[4152]: at org.apache.uima.impl.CollectionReaderFactory_impl.produceResource(CollectionReaderFactory_impl.java:92) ~[uimaj-core-3.2.0.jar!/:3.2.0]
Jan 13 20:46:23 blinky inception-stable.jar[4152]: at org.apache.uima.impl.CompositeResourceFactory_impl.produceResource(CompositeResourceFactory_impl.java:62) ~[uimaj-core-3.2.0.jar!/:3.2.0]
Jan 13 20:46:23 blinky inception-stable.jar[4152]: at org.apache.uima.UIMAFramework.produceResource(UIMAFramework.java:289) ~[uimaj-core-3.2.0.jar!/:3.2.0]
Jan 13 20:46:23 blinky inception-stable.jar[4152]: at org.apache.uima.UIMAFramework.produceResource(UIMAFramework.java:341) ~[uimaj-core-3.2.0.jar!/:3.2.0]
Jan 13 20:46:23 blinky inception-stable.jar[4152]: at org.apache.uima.UIMAFramework.produceCollectionReader(UIMAFramework.java:820) ~[uimaj-core-3.2.0.jar!/:3.2.0]
Jan 13 20:46:23 blinky inception-stable.jar[4152]: at org.apache.uima.fit.factory.CollectionReaderFactory.createReader(CollectionReaderFactory.java:424) ~[uimafit-core-3.2.0.jar!/:3.2.0]
Jan 13 20:46:23 blinky inception-stable.jar[4152]: at de.tudarmstadt.ukp.inception.export.DocumentImportExportServiceImpl.importCasFromFile(DocumentImportExportServiceImpl.java:302) ~[inception-export-22.1.jar!/:?]
Jan 13 20:46:23 blinky inception-stable.jar[4152]: at de.tudarmstadt.ukp.inception.export.DocumentImportExportServiceImpl$$FastClassBySpringCGLIB$$6bf689d0.invoke(<generated>) ~[inception-export-22.1.jar!/:?]
This is related to https://github.com/dkpro/dkpro-core/issues/1511 - I hit this already before the 22.0 release, but I thought I had added the required dependencies back before the release...
I think this exclusion here is the problem...
<dependency>
<groupId>org.dkpro.core</groupId>
<artifactId>dkpro-core-api-resources-asl</artifactId>
<version>${dkpro.version}</version>
<exclusions>
<!-- We do not use DKPro Core model-downloading -->
<exclusion>
<groupId>org.apache.maven</groupId>
<artifactId>maven-model</artifactId>
</exclusion>
<!--
Cannot exclude until https://github.com/dkpro/dkpro-core/issues/1511 is fixed
<exclusion>
<groupId>org.codehaus.plexus</groupId>
<artifactId>plexus-utils</artifactId>
</exclusion>
<exclusion>
<groupId>org.apache.ivy</groupId>
<artifactId>ivy</artifactId>
</exclusion>
-->
</exclusions>
</dependency>
Actually, it's a different issue. Even removing the exclusion we still have the problem because in another place, I imported a wrong dependency. Anyway - will be fixed for 22.2 - thanks for the report!
Describe the bug I got this error when I try to upload a CoNLL file: "Error while uploading document sent1.conll: ClassNotFoundException: org.codehaus.plexus.util.xml.pull.XmlPullParserException". It fails with all CoNLL format options, even with all "O" labels in the file.
To Reproduce Steps to reproduce the behavior:
Expected behavior It should load the labeled file in CoNLL format.
Screenshots
Please complete the following information:
Additional context Add any other context about the problem here.