Questions about ingest - Githubissues

LauraErhard commented 5 years ago

I'm trying to write workflows for LOCDB and stumbled over a few things while describing how to ingest files. 2019-04-11_ingest a. Why is the plain text field here? Did we at one point want to ingest plain text with this option and after adding the add plain text we forgot to delete the option here? Or does it have another purpose?

I'm not sure I understand the different possibilities I can choose with embodiment typ and textual pdf? b. The embodiment typ is just the difference between scanned pdf and nativ digital pdf, right? What exactly is the difference in the next steps/backend? c. The question about the textual pdf just asks if the pdf is already OCR processed and has some text in the metadata, right?! And if I know that I have a texual pdf that the processing is faster than with image pdfs?!

lgalke commented 5 years ago

The text field should only appear for items added with the 'Add plain text' button. I pushed a fix: 33705c57312eb2bc69d065f5305347c5a5c3b41a

abdelqader-mohammad commented 5 years ago

The changes are deployed into dev. @LauraErhard can you test it before deploying the changes into production and Tuebinger?

LauraErhard commented 5 years ago

Ingesting a book chapter: Send Scans failed: Object { identifier: {…}, firstpage: 43, lastpage: 44, file: File, _resourceType: "BOOK_CHAPTER", uploading: false, textualPdf: false, embodimentType: "PRINT", plainText: null, err: {…} }Object { headers: {…}, status: 502, statusText: "Proxy Error", url: "https://locdb.bib.uni-mannheim.de/locdb-dev/saveResource", ok: false, name: "HttpErrorResponse", message: "Http failure response for https://locdb.bib.uni-mannheim.de/locdb-dev/saveResource: 502 Proxy Error", error: "<!DOCTYPE HTML PUBLIC \"-//IETF//DTD HTML 2.0//EN\">\n<html><head>\n<title>502 Proxy Error</title>\n</head><body>\n<h1>Proxy Error</h1>\n<p>The proxy server received an invalid\r\nresponse from an upstream server.<br />\r\nThe proxy server could not handle the request <em><a href=\"/locdb-dev/saveResource\">POST /locdb-dev/saveResource</a></em>.<p>\nReason: <strong>Error reading from remote server</strong></p></p>\n<hr>\n<address>Apache/2.4.25 (Debian) Server at locdb.bib.uni-mannheim.de Port 443</address>\n</body></html>\n" }

Ingesting a monograph:

Send Scans failed: Object { identifier: {…}, firstpage: null, lastpage: null, file: File, _resourceType: "MONOGRAPH", uploading: false, textualPdf: false, embodimentType: "PRINT", plainText: null, err: {…} } Object { headers: {…}, status: 500, statusText: "Internal Server Error", url: "https://locdb.bib.uni-mannheim.de/locdb-dev/saveResource", ok: false, name: "HttpErrorResponse", message: "Http failure response for https://locdb.bib.uni-mannheim.de/locdb-dev/saveResource: 500 Internal Server Error", error: null }

CORRECTION: I had the wrong ID, therefore it didn't work. My mistake. Everything works fine!

LauraErhard commented 5 years ago

I had the wrong ID, therefore it didn't work. My mistake. Everything works fine!

locdb / locdb-frend

Questions about ingest #449