hotosm / oam-uploader-api

DEPRECIATED - The OAM Uploader API server
BSD 3-Clause "New" or "Revised" License
3 stars 4 forks source link

Error on jp2 conversion #59

Closed smit1678 closed 7 years ago

smit1678 commented 7 years ago

Running a staging version of the Uploader on a new ec2 instance and running into a gdal error when testing on the file: https://s3.amazonaws.com/hotosm-oam-uploader-test/Emae_Pakoro_Ortho_09042015_0800_jp2.jp2.

At first glance, it seems that the error, Warning 6: Driver GTiff does not support NUM_THREADS creation option (which I get on my local install of gdal on OSX), causes the worker process to fail. Locally the warning is logged but the file is still processed.

{
    "message": "Command failed: /usr/bin/gdal_translate -of GTiff /local/tmp-208lcux151xY1xl.jp2 /local/tmp-20844Qcds39gBQM.tif -co TILED=yes -co COMPRESS=DEFLATE -co PREDICTOR=2 -co SPARSE_OK=yes -co BLOCKXSIZE=512 -co BLOCKYSIZE=512 -co NUM_THREADS=ALL_CPUS\nWarning 6: Driver GTiff does not support NUM_THREADS creation option\n",
    "stack": "Error: Command failed: /usr/bin/gdal_translate -of GTiff /local/tmp-208lcux151xY1xl.jp2 /local/tmp-20844Qcds39gBQM.tif -co TILED=yes -co COMPRESS=DEFLATE -co PREDICTOR=2 -co SPARSE_OK=yes -co BLOCKXSIZE=512 -co BLOCKYSIZE=512 -co NUM_THREADS=ALL_CPUS\nWarning 6: Driver GTiff does not support NUM_THREADS creation option\n\n    at ChildProcess.exithandler (child_process.js:754:12)\n    at ChildProcess.emit (events.js:110:17)\n    at maybeClose (child_process.js:1019:16)\n    at Process.ChildProcess._handle.onexit (child_process.js:1091:5)"
}

cc @nbumbarger

nbumbarger commented 7 years ago

I couldn't duplicate this problem locally, but I removed the "NUM_THREADS=ALL_CPUS" option from the gdal_translate command in the worker. From what I read, this error can appear, or not, based on the driver needed and the size of the input imagery. If the EC2 instance is single-threaded, it could also be that some of the drivers are compiled without this option during deployment.

smit1678 commented 7 years ago

@nbumbarger Did you happen to test with Docker? Removing the NUM_THREADS doesn't resolve the issue. Looking further into this, it looks like there isn't an issue with the installation of gdal either. Running the gdal_translate command directly within the docker container successfully converts the file.

But when running within the worker process, still seeing the above error.

nbumbarger commented 7 years ago

@smit1678 For me, removing the NUM_THREADS option results in a "not recognized as a supported file format" error for that file. Are you seeing the same?

smit1678 commented 7 years ago

@nbumbarger No, not the same. Are you running these tests within Docker?

nbumbarger commented 7 years ago

No, I'll test that as well. I misinterpreted the issue as saying that the containerized process was working. You meant that the gdal command works in Docker.

nbumbarger commented 7 years ago

@smit1678 I wasn't able to complete processing with that particular jp2 image using the current tiff conversion settings- it hangs for a long time, and eventually throws an error of (sharp:31518): GLib-CRITICAL **: g_hash_table_lookup: assertion 'hash_table != NULL' failed at the thumbnail generation step. From what I read, this is an unhandled error due to corrupt input data, although the image appears to be fine. I'll look into which particular gdal_translate option is generating images that sharp can't read.

nbumbarger commented 7 years ago

@smit1678 @mojodna In local testing, removing the tiled conversion option allowed sharp to read this test image and finish processing (other tiled images had completed, but the implementation seems to be unreliable). I also fixed a bug that was allowing converted images to upload with their original extensions. I haven't yet tested these in Docker. If it's imperative to tile the images, we'll need to do two separate conversions: one readable by sharp, and one for the final upload.

mojodna commented 7 years ago

It's imperative to tile the images, sorry (allows them to be read in parts, which is particularly important for large images).

Would gdalwarp to a PNG serve as a thumbnail process in place of sharp?

nbumbarger commented 7 years ago

Alright, that's fine. It looks like we can probably use an option called outsize (output size) in gdal_translate.

nbumbarger commented 7 years ago

@mojodna @smit1678 I replaced the sharp thumbnail method with a call to gdal_translate, and I haven't had any problems with tiled input data after testing with a range of image sizes and types.

nbumbarger commented 7 years ago

This image appears to be causing a memory issue on the Linux server. It looks like this instance has no swap disk space, leaving it with around 3.5GB for processing. It's my understanding that ec2 instances can be configured with extra swap space without having to move to a larger type.

If you ssh into the server and start top before processing, you can watch memory usage quickly climb to the limit before the console becomes unresponsive, at which point gdal_translate drops off the computing resources list and kswapd0 takes over, which is a swap space manager that can't do anything since there's no swap space. Sometimes top becomes unresponsive before this sequence can be observed, but it causes the system to go unresponsive for some time before eventually timing out with a generic error message.

smit1678 commented 7 years ago

@nbumbarger Thanks for digging into this. So it looks like the original size of the uploader server won't work then. I've redeployed this on a larger server (t2.xlarge with 16GB RAM) and it seems to be working great. I've also hooked up a staging Uploader form: http://hotosm-oam-uploader-staging.s3-website-us-east-1.amazonaws.com/. I'm going to close this out in favor of newer tickets as the come up through more testing.