inception-project / inception

INCEpTION provides a semantic annotation platform offering intelligent annotation assistance and knowledge management.
https://inception-project.github.io
Apache License 2.0
593 stars 151 forks source link

PDF editor resources missing from release #2946

Closed A-Menu closed 2 years ago

A-Menu commented 2 years ago

Hello,

I'm new to INCEpTION, so excuse me if my question is out of place.

Brief description I'm trying to annotate some PDFs, but it seems that when the PDF reaches a certain size (apparently arround 7.0 / 7.5 MB) an error appears. It seems strange because if I understand your guidelines, "the server configuration limits the individual file size and total batch size (the default limit is 100MB for both)". I am unaware if the server I'm using changed the limits, but the problem also happens in your demo server.

I can load the document in the settings, but as soon as it has finished loading in the annotation pannel (basically as soon as the user should be ready to create an annotation), a "Whoops! Something went wrong" appears. Sometimes it seems okay, but the error will appear shortly after (typically if you start scrolling).

To Reproduce Steps to reproduce the behavior:

  1. Import a PDF to annotate, for instance of 10 / 15 MB
  2. Click on the "Annotation" pannel and load the PDF to annotate it
  3. Wait / scroll
  4. See error

Expected behavior I expected to be able to annotate the PDF

Screenshots error

Please complete the following information:

Thank you in advance !

reckart commented 2 years ago

What file type do you choose when importing the PDF? Do you actually tell INCEpTION to import the file as PDF?

Screenshot 2022-04-04 at 15 19 55
A-Menu commented 2 years ago

Hi, thanks for the very quick response !

Yes I have. By the way, the problem happens both with the "Auto" and the "PDF" editor in the annotation pannel preferences.

reckart commented 2 years ago

Looking at the demo server logs - is that your file? 1000_1444556858216.png - it seems to be a PNG file (image) - not a PDF file (document)?

ImportDocumentsPanel - 1000_1444556730651.png: Error: Header doesn't contain versioninfo
java.io.IOException: Error: Header doesn't contain versioninfo
     at org.apache.pdfbox.pdfparser.PDFParser.parse(PDFParser.java:221) ~[pdfbox-2.0.25.jar!/:2.0.25]
     at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1228) ~[pdfbox-2.0.25.jar!/:2.0.25]
     at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1128) ~[pdfbox-2.0.25.jar!/:2.0.25]
     at org.dkpro.core.io.pdf.internal.Pdf2CasConverter.writeText(Pdf2CasConverter.java:59) ~[dkpro-core-io-pdf-asl-2.2.0.jar!/:?]
     at org.dkpro.core.io.pdf.PdfReader.getNext(PdfReader.java:159) ~[dkpro-core-io-pdf-asl-2.2.0.jar!/:?]
     at t.DocumentImportExportServiceImpl.importCasFromFile(DocumentImportExportServiceImpl.java:308) ~[inception-export-23.0.1.jar!/:?]
     at de.tudarmstadt.ukp.inception.export.DocumentImportExportServiceImpl$$FastClassBySpringCGLIB$$6bf689d0.invoke(<generated>) 
     at org.springframework.cglib.proxy.MethodProxy.invoke(MethodProxy.java:218) ~[spring-core-5.3.18.jar!/:5.3.18]
...
A-Menu commented 2 years ago

No it wasn't this file ; I created a demo project called "file_size_test" with a PDF exemple. It's a small one (less than 3 MB), but it makes the same error now (I can't upload big files at the moment it seems) ; the first demo project I created is called "test-dumas".

reckart commented 2 years ago

Ok. I think the error you see on the demo server might be a different one from the one you see locally. We'll have a a look.

A-Menu commented 2 years ago

Thank you

It's strange because the demo server didn't act like this before (it didn't make the error on small PDFs, and I could upload bigger ones). The other server is still in the same state though, it didn't change just now like the demo

reckart commented 2 years ago

The demo server was updated on the weekend to 23.0.1 - I expect there might be a new bug related to displaying PDFs in general.

A-Menu commented 2 years ago

If this helps, I discovered this problem last Friday when trying to annotate a PDF on the "not-demo server"

reckart commented 2 years ago

Looks there are some files missing from the PDF editor module in the release version ... no idea why yet.

reckart commented 2 years ago

When building from a fresh checkout, the pdfanno resources are missing:

[INFO] --- maven-resources-plugin:3.2.0:resources (default-resources) @ inception-pdf-editor ---
[INFO] Using 'UTF-8' encoding to copy filtered resources.
[INFO] Using 'UTF-8' encoding to copy filtered properties files.
[INFO] Copying 12 resources
[INFO] Copying 0 resource
[INFO] Copying 3 resources
[INFO] Copying 2 resources
[INFO] skip non existing resourceDirectory /Users/bluefire/git/inception-application-release/inception/inception-pdf-editor/src/main/js/pdfanno/dist/pdfanno

The last line should read instead:

[INFO] Copying 382 resources to pdfanno
A-Menu commented 2 years ago

Thank you for your help