Closed arildm closed 1 year ago
Unfortunately I cannot reproduce this behaviour. Do you still have the corpus where this error occurred? Then I could try to run it with the exact same configuration and files. I tried uploading and running just the file you posted and for me the process quit quickly (after a few seconds).
I need some more time to think about the bonus questions :)
The answer to the bonus question is: yes! I now changed the code so that seconds_taken
and last_run_ended
are included in the response when a process is running, finished successfully, or finished with an error.
I reproduced it now (a couple of times), it's mink-fnjxq5rb5l
. Now, the error message does show, and I'm not sure why it wouldn't when I created this issue. The job_status.sparv
is still "running"
so the frontend keeps polling for a few seconds more (~9s, not 25s) but that's not really a problem, so I'm closing this issue. I guess Sparv/Mink BE needs to do some things after the error happens, before the job is done. Screenshots below.
I first get this:
And then soon this:
Ah okay, thanks for the feedback! Yes, I think what happens is that the queue manager (which is run in regular intervals) needs to unqueue the job before its status changes. Maybe this could be improved... It would of course be better if the status changed immediately, but there might be cases where the Sparv process should keep running (i.e. in order to finish other things) despite some error occurring. For now I think we can live with a ~9 seconds delay, but 25 seconds seems too much. Not sure why it took so long that time... Let's keep an eye on it!
I tried running an annotation job for a corpus with a PDF file with unreadable text (16483.pdf). After 5 s,
check-status
returned this response, whereerrors
andsparv_output
reflect the error that occurred:Response
```json { "status": "success", "message": "Job is running", "errors": "ERROR No text was found in the file '16483.pdf'! This file cannot be processed with Sparv. Please make sure that every PDF input file contains machine readable text.\n\n(file: 16483)", "sparv_output": "Job execution failed. See log messages above or logs/2023-07-11_14.08.13.372603.log for details.", "job_status": { "sync2sparv": "none", "sync2storage": "none", "sparv": "running", "korp": "none" }, "sparv_exports": [ "xml_export:pretty", "csv_export:csv", "stats_export:sbx_freq_list" ], "available_files": [ { "name": "16483.pdf", "type": "application/pdf", "last_modified": "2023-07-11T14:08:01+02:00", "size": 115417, "path": "16483.pdf" } ], "installed_korp": false, "current_process": "sparv", "seconds_taken": 5.567173, "last_run_started": "2023-07-11T14:08:12+02:00", "progress": "3%" } ```Since
job_status.sparv
is still"running"
, the frontend will not show any errors, and will keep pollingcheck-status
. After 25 s, it returned this response, wherejob_status
is finally updated:Response
```json { "status": "success", "message": "An error occurred during processing", "errors": "ERROR No text was found in the file '16483.pdf'! This file cannot be processed with Sparv. Please make sure that every PDF input file contains machine readable text.\n\n(file: 16483)", "sparv_output": "Job execution failed. See log messages above or logs/2023-07-11_14.08.13.372603.log for details.", "job_status": { "sync2sparv": "none", "sync2storage": "none", "sparv": "error", "korp": "none" }, "sparv_exports": [ "xml_export:pretty", "csv_export:csv", "stats_export:sbx_freq_list" ], "available_files": [ { "name": "16483.pdf", "type": "application/pdf", "last_modified": "2023-07-11T14:08:01+02:00", "size": 115417, "path": "16483.pdf" } ], "installed_korp": false, "current_process": "sparv", "last_run_started": "2023-07-11T14:08:12+02:00", "progress": "3%" } ```Only now does the error show in the frontend.
Could we have the
job_status
update sooner? Or do you think the frontend should useerrors
(or some other part of the response) to determine whether an error has happened?(Bonus question: shouldn't we have
seconds_taken
andlast_run_ended
there as well?)