inception-project / inception

INCEpTION provides a semantic annotation platform offering intelligent annotation assistance and knowledge management.
https://inception-project.github.io
Apache License 2.0
593 stars 151 forks source link

Minmal HTML document cannot be imported #2177

Closed jcklie closed 3 years ago

jcklie commented 3 years ago

Describe the bug When I import

Error while uploading document foo.html: NullPointerException: Cannot invoke "org.dkpro.core.api.xml.CasXmlHandler$StackFrame.isCaptureText()" because the return value of "java.util.Deque.peek()" is null

<!DOCTYPE html>
<html lang="en">
  <head>

  </head>
  <body>
      <p>Ich mag Rentiere</p>
      <p>I like reindeer.</p>
  </body>
</html>

Please complete the following information:

reckart commented 3 years ago

Looks like the importer doesn't like any type of characters outside the root <html>/</html> tags. I.e. removing the doctype and and potentially trailing empty lines and linebreaks makes the import work.