inception-project / inception

INCEpTION provides a semantic annotation platform offering intelligent annotation assistance and knowledge management.
https://inception-project.github.io
Apache License 2.0
593 stars 151 forks source link

Cannot import CoNLL file #2819

Closed fabiod20 closed 2 years ago

fabiod20 commented 2 years ago

Describe the bug I got this error when I try to upload a CoNLL file: "Error while uploading document sent1.conll: ClassNotFoundException: org.codehaus.plexus.util.xml.pull.XmlPullParserException". It fails with all CoNLL format options, even with all "O" labels in the file.

To Reproduce Steps to reproduce the behavior:

  1. Go to 'Documents'
  2. Upload a CoNLL file
  3. Select a CoNLL format
  4. Click on 'Import'
  5. See error: Error while uploading document sent1.conll: ClassNotFoundException: org.codehaus.plexus.util.xml.pull.XmlPullParserException

Expected behavior It should load the labeled file in CoNLL format.

Screenshots image

Please complete the following information:

Additional context Add any other context about the problem here.

jcklie commented 2 years ago

Can you give us an example of a Conll file that causes this error?

reckart commented 2 years ago

That's interesting because conll 2003 isn't even an XML format

fabiod20 commented 2 years ago

Sure. I tried with this test file, both with ".txt" and ".conll" extensions. test.txt

reckart commented 2 years ago

If I try importing the sample CoNLL 2003 data from our documentation, it works:

U.N. NNP I-NP I-ORG
official NN I-NP O
Ekeus NNP I-NP I-PER
heads VBZ I-VP O
for IN I-PP O
Baghdad NNP I-NP I-LOC
. . O O
reckart commented 2 years ago

And if I try to import you test file, I get this error:

Screenshot 2022-01-13 at 20 44 26
reckart commented 2 years ago

... that is locally in my dev environment ... if I try it on a server installation, I also get the Error while uploading document test.txt: ClassNotFoundException: org.codehaus.plexus.util.xml.pull.XmlPullParserException error - funny.

jcklie commented 2 years ago

The issue is that INCEpTION only supports the strict Conll 2003 file format which needs 3 labels per token (POS, NER, CHUNK), see here for an example. Another issue is that Conll 2003 uses IOB and you use BIO. I added an issue to create the file format that you and many others want to import, but it will take a while.

The best way I see for you to go forward is to either adhere to the Conll 2003 format we support or use Conll-U, e.g. via https://pypi.org/project/conllu/ .

The error message is strange though and needs investigation.

reckart commented 2 years ago

I have cleaned up dependencies a bit for 22.0 ... looks like maybe a bit too eager.

Jan 13 20:46:23 blinky inception-stable.jar[4152]: 2022-01-13 20:46:23 ERROR [rec] ImportDocumentsPanel - test.txt: org/codehaus/plexus/util/xml/pull/XmlPullParserException
Jan 13 20:46:23 blinky inception-stable.jar[4152]: java.lang.NoClassDefFoundError: org/codehaus/plexus/util/xml/pull/XmlPullParserException
Jan 13 20:46:23 blinky inception-stable.jar[4152]:         at org.dkpro.core.api.resources.MappingProviderFactory.createPosMappingProvider(MappingProviderFactory.java:46) ~[dkpro-core-api-resources-asl-2.2.0.jar!/:?]
Jan 13 20:46:23 blinky inception-stable.jar[4152]:         at org.dkpro.core.io.conll.Conll2003Reader.initialize(Conll2003Reader.java:174) ~[dkpro-core-io-conll-asl-2.2.0.jar!/:?]
Jan 13 20:46:23 blinky inception-stable.jar[4152]:         at org.apache.uima.fit.component.CasCollectionReader_ImplBase.initialize(CasCollectionReader_ImplBase.java:41) ~[uimafit-core-3.2.0.jar!/:3.2.0]
Jan 13 20:46:23 blinky inception-stable.jar[4152]:         at org.apache.uima.collection.CollectionReader_ImplBase.initialize(CollectionReader_ImplBase.java:67) ~[uimaj-core-3.2.0.jar!/:3.2.0]
Jan 13 20:46:23 blinky inception-stable.jar[4152]:         at org.apache.uima.impl.CollectionReaderFactory_impl.produceResource(CollectionReaderFactory_impl.java:92) ~[uimaj-core-3.2.0.jar!/:3.2.0]
Jan 13 20:46:23 blinky inception-stable.jar[4152]:         at org.apache.uima.impl.CompositeResourceFactory_impl.produceResource(CompositeResourceFactory_impl.java:62) ~[uimaj-core-3.2.0.jar!/:3.2.0]
Jan 13 20:46:23 blinky inception-stable.jar[4152]:         at org.apache.uima.UIMAFramework.produceResource(UIMAFramework.java:289) ~[uimaj-core-3.2.0.jar!/:3.2.0]
Jan 13 20:46:23 blinky inception-stable.jar[4152]:         at org.apache.uima.UIMAFramework.produceResource(UIMAFramework.java:341) ~[uimaj-core-3.2.0.jar!/:3.2.0]
Jan 13 20:46:23 blinky inception-stable.jar[4152]:         at org.apache.uima.UIMAFramework.produceCollectionReader(UIMAFramework.java:820) ~[uimaj-core-3.2.0.jar!/:3.2.0]
Jan 13 20:46:23 blinky inception-stable.jar[4152]:         at org.apache.uima.fit.factory.CollectionReaderFactory.createReader(CollectionReaderFactory.java:424) ~[uimafit-core-3.2.0.jar!/:3.2.0]
Jan 13 20:46:23 blinky inception-stable.jar[4152]:         at de.tudarmstadt.ukp.inception.export.DocumentImportExportServiceImpl.importCasFromFile(DocumentImportExportServiceImpl.java:302) ~[inception-export-22.1.jar!/:?]
Jan 13 20:46:23 blinky inception-stable.jar[4152]:         at de.tudarmstadt.ukp.inception.export.DocumentImportExportServiceImpl$$FastClassBySpringCGLIB$$6bf689d0.invoke(<generated>) ~[inception-export-22.1.jar!/:?]

This is related to https://github.com/dkpro/dkpro-core/issues/1511 - I hit this already before the 22.0 release, but I thought I had added the required dependencies back before the release...

reckart commented 2 years ago

I think this exclusion here is the problem...

      <dependency>
        <groupId>org.dkpro.core</groupId>
        <artifactId>dkpro-core-api-resources-asl</artifactId>
        <version>${dkpro.version}</version>
        <exclusions>
          <!-- We do not use DKPro Core model-downloading -->
          <exclusion>
            <groupId>org.apache.maven</groupId>
            <artifactId>maven-model</artifactId>
          </exclusion>
          <!--  
          Cannot exclude until https://github.com/dkpro/dkpro-core/issues/1511 is fixed
          <exclusion>
            <groupId>org.codehaus.plexus</groupId>
            <artifactId>plexus-utils</artifactId>
          </exclusion>
          <exclusion>
            <groupId>org.apache.ivy</groupId>
            <artifactId>ivy</artifactId>
          </exclusion>
          -->
        </exclusions>
      </dependency>
reckart commented 2 years ago

Actually, it's a different issue. Even removing the exclusion we still have the problem because in another place, I imported a wrong dependency. Anyway - will be fixed for 22.2 - thanks for the report!