Closed mzeinstra closed 9 years ago
This info on wiki seems to be inaccurate, I forgot to update it, this info is there from very start of project. We haven't discussed possibility to show exact error info on /ingest
page. It isn't stored in database, it is available via docker-compose logs
if needed.
It took approx. 30 minutes because one image is tried 5x (in default configuration) times before error is written. And there is a random few minutes long time between attempts.
Log says that this big image can't be properly identified and converted. I'm going to test it locally to see if image is e.g. corrupted.
Oh I see, your's link isn't a link to image it is a link to html page with thumbnail and some info. So html page was downloaded and it wasn't been correctly identified as image. Correct link for huge image is https://upload.wikimedia.org/wikipedia/commons/6/6d/The_Garden_of_Earthly_Delights_by_Bosch_High_Resolution.jpg
Isn't that the link I provided?
"url": [
"https://upload.wikimedia.org/wikipedia/commons/6/6d/The_Garden_of_Earthly_Delights_by_Bosch_High_Resolution.jpg"
],
I'm sorry I was looking on different one, I'm going to try it.
Regardless, what are we going to do about the error message, if it is in the original plan I would like to have the API output them as described in the wiki.
I don't think I have access docker compose logs.
These logs are available on media.embedr.eu server itself. This server can be accessed directly via ssh with AWS generated keypair and registered with EC2 instance. I can set different approaches for ssh but one with certificate is the most secure.
We should discuss better error message display with @klokan
I have just realized problem with huge image too. Conversion from jpg to tif (which is needed before compression to jp2) is being killed constantly due memory overload. It tries to make 1.5GB big tif from 233MB jpg and it needs more than 4GB memory (which is on media.embedr.eu
, but it is shared across whole server). There is no problem with ingestion of this file on my local computer with 16GB memory.
So if there is a need to process such a big images we have to buy more powerful (and more expensive) EC2 instance.
This will be implemented (probably in the first half of next week).
It will run for newly ingested batches. Probably only two error messages will be present:
Aha. Looking back...
There is again complication because of the introduced sequences - each image in the sequence may have different error message...
This is the reason why we have removed this from the code. For sequences the message should be a list of all "error" messages? I expect no message field in case of success.
@mzeinstra please comment if this really bring a value to the scripts which will use the API - and if we should implement this. @mzeinstra please comment whether you want larger EC2 instance for the media.embedr.eu with the existing situation.
Also - we have asked you to deliver the samples with large and various expected images in the ticket #20 back in April... If similar samples would be provided at that time, we would have develop against these during BETA phase, now after the project has been turned on to production any change is slightly more complicated...
@mzeinstra please comment.
On 1. It does bring value to the API to know why something failed. For example we did not know that encoding failed or downloading failed in the example above.
On 2. No I do not require a large EC2 instance to solve this issue of large images. I was pushing the limit of the service and I did expect to hit a ceiling somewhere. It is good to know that there is a ceiling. Once we come across these for one of the institutions we are ingesting this will become a new problem. I don't expect that to happen this year, as I had to really search for this large image.
I know this can frustrate as we could have avoided this back in April, but at that time we didn't have access to these kind of records.
Message with more error info is added, a new ingested items will show it.
I've been testing the limits of the ingestion API. I tried the largest image I could find online ~220mb: https://commons.wikimedia.org/wiki/File:The_Garden_of_Earthly_Delights_by_Bosch_High_Resolution.jpg
I've put that into a batch:
This produces an error after ~ 30 minutes:
According to the wiki I should get a message that details the error: https://github.com/klokantech/hawk/wiki/C.Ingest ("error": The image can not be downloaded or transcoded. The "message" field will specify details.)