Open boskar opened 8 years ago
Are these happening on a regular basis or just after particular requests? Is there anything in the iipsrv log flie before the processes hang?
So far It happened only once.
Caught Terminated signal. Terminating after 25778 accesses
Mon Mar 21 15:18:20 2016
<----------------------------------->
[Mon Mar 21 15:11:49.016309 2016] [fcgid:emerg] [pid 31899:tid 140475805714176] (35)Resource deadlock avoided: [client cut2:54125] mod_fcgid: can't get pipe mutex, referer: http://mikroskop.sum.edu.pl/pato/index.php?szkielko_guid=D95A9B8EEEBF41E89D5A88AC3975B9F0
[Mon Mar 21 15:13:22.065236 2016] [fcgid:warn] [pid 2380:tid 140475908101888] (32)Broken pipe: [client cut2:54142] mod_fcgid: ap_pass_brigade failed in handle_request_ipc function, referer: http://mikroskop.sum.edu.pl/pato/index.php?szkielko_guid=D95A9B8EEEBF41E89D5A88AC3975B9F0
[Mon Mar 21 15:13:22.481727 2016] [fcgid:warn] [pid 2380:tid 140475679823616] (32)Broken pipe: [client cut2:54135] mod_fcgid: ap_pass_brigade failed in handle_request_ipc function, referer: http://mikroskop.sum.edu.pl/pato/index.php?szkielko_guid=D95A9B8EEEBF41E89D5A88AC3975B9F0
[Mon Mar 21 15:19:25.309604 2016] [fcgid:warn] [pid 2380:tid 140475721787136] (32)Broken pipe: [client cut2:54250] mod_fcgid: ap_pass_brigade failed in handle_request_ipc function, referer: http://mikroskop.sum.edu.pl/pato/index.php?szkielko_guid=F6B2B644EB68426F97414415D903F157
[Mon Mar 21 15:19:25.320646 2016] [fcgid:warn] [pid 2380:tid 140475663038208] (32)Broken pipe: [client cut2:54254] mod_fcgid: ap_pass_brigade failed in handle_request_ipc function, referer: http://mikroskop.sum.edu.pl/pato/index.php?szkielko_guid=F6B2B644EB68426F97414415D903F157
[Mon Mar 21 15:19:25.395757 2016] [fcgid:warn] [pid 2380:tid 140475705001728] (32)Broken pipe: [clientcut2:54249] mod_fcgid: ap_pass_brigade failed in handle_request_ipc function, referer: http://mikroskop.sum.edu.pl/pato/index.php?szkielko_guid=F6B2B644EB68426F97414415D903F157
[Mon Mar 21 15:19:26.512666 2016] [fcgid:warn] [pid 2380:tid 140475814106880] (32)Broken pipe: [client cut2:54247] mod_fcgid: ap_pass_brigade failed in handle_request_ipc function, referer: http://mikroskop.sum.edu.pl/pato/index.php?szkielko_guid=F6B2B644EB68426F97414415D903F157
[Mon Mar 21 15:19:27.245348 2016] [fcgid:warn] [pid 2380:tid 140475646252800] (32)Broken pipe: [client cut2:54251] mod_fcgid: ap_pass_brigade failed in handle_request_ipc function, referer: http://mikroskop.sum.edu.pl/pato/index.php?szkielko_guid=F6B2B644EB68426F97414415D903F157
[Mon Mar 21 15:21:40.763941 2016] [fcgid:emerg] [pid 31898:tid 140475746965248] (35)Resource deadlock avoided: [client cut:41507] mod_fcgid: can't get pipe mutex, referer: http://mikroskop.sum.edu.pl/pato/index.php?szkielko_guid=F9A1E62503864907A9021F0AC8F1D603
[Mon Mar 21 15:21:42.817688 2016] [fcgid:emerg] [pid 2461:tid 140475772143360] (35)Resource deadlock avoided: [client cut:46665] mod_fcgid: can't get pipe mutex, referer: http://mikroskop.sum.edu.pl/pato/index.php?szkielko_guid=F9A1E62503864907A9021F0AC8F1D603
[Mon Mar 21 15:22:06.838043 2016] [fcgid:warn] [pid 2380:tid 140475688216320] [client cut:43085] mod_fcgid: can't apply process slot for /usr/lib/cgi-bin/iipsrv7.fcgi, referer: http://mikroskop.sum.edu.pl/pato/index.php?szkielko_guid=3EFC499487974EEC97AF3F7C9A8E13A3
[Mon Mar 21 15:22:43.883245 2016] [fcgid:warn] [pid 2380:tid 140475763750656] [client cut:36284] mod_fcgid: can't apply process slot for /usr/lib/cgi-bin/iipsrv7.fcgi, referer: http://mikroskop.sum.edu.pl/pato/index.php?szkielko_guid=F9A1E62503864907A9021F0AC8F1D603
I have not tried to stop apache, it might be just random, I will give more details if I catch again. On the other hand - I'm not sure if I use the current mainline.
I would rather like to know how to kill malfunctioning instances - Should I give them maximal time of life?
I'm not an Apache expert, but if it's spawned by Apache I suspect you would need to manage it with Apache.
If you use an external FCGI process you can have other components manage it. I think systemd
would do this. I use supervisor.
Ruven, When will launch the 1.0 version of Windows?
As the trafiic is much larger now I got defunct iipsrv daily in rush hours.
I've found some fcgid options FcgidZombieScanInterval 5 FcgidIdleScanInterval 5 FcgidErrorScanInterval 5
but that's not enough. Any ideas?
I'm not sure what is happening. iipsrv should not be freezing in this way. However, if you look back to the iipsrv log file you quoted, the process hasn't crashed or frozen, but has exited on a TERM signal. You'll need to show me more from the log file output to really understand the root cause of this.
Full Request is iiif=/home/patomorfa/tif2/12_73_512.tiff/58368,88064,1024,1024/512,/0/default.jpg [1/1]: Command / Argument is iiif : /home/patomorfa/tif2/12_73_512.tiff/58368,88064,1024,1024/512,/0/default.jpg FIF :: Image cache hit. Number of elements: 100 FIF :: TIFF image detected FIF :: Image dimensions are 143360 x 107520 FIF :: Image contains 3 channels with 8 bits per channel FIF :: Image timestamp: Sun, 08 Jan 2017 12:01:14 GMT FIF :: Total command time 467 microseconds CVT handler reached TileManager :: Cache Miss for resolution: 8, tile: 12097 TileManager :: Cache Size: 13 tiles, 9.98872 MB Full Request is iiif=/home/patomorfa/tif2/12_73_512.tiff/58368,89088,1024,1024/512,/0/default.jpg [1/1]: Command / Argument is iiif : /home/patomorfa/tif2/12_73_512.tiff/58368,89088,1024,1024/512,/0/default.jpg FIF :: Image cache hit. Number of elements: 107 FIF :: TIFF image detected FIF :: Image dimensions are 143360 x 107520 FIF :: Image contains 3 channels with 8 bits per channel FIF :: Image timestamp: Sun, 08 Jan 2017 12:01:14 GMT FIF :: Total command time 434 microseconds CVT handler reached TileManager :: Cache Miss for resolution: 8, tile: 12237 TileManager :: Cache Size: 13 tiles, 9.98871 MB TileManager :: Tile cache insertion time: 399 microseconds TileManager :: Total Tile Access Time: 7133 microseconds TileManager getRegion :: Tile access time 7227 microseconds for tile 12097 at resolution 8 TileManager :: Tile cache insertion time: 448 microseconds TileManager :: Total Tile Access Time: 7719 microseconds TileManager getRegion :: Tile access time 7879 microseconds for tile 12237 at resolution 8 CVT :: Total command time 24078 microseconds IIIF :: Total command time 24599 microseconds Total Request Time: 24777 microseconds image closed and deleted Server count is 27419
CVT :: Total command time 20622 microseconds IIIF :: Total command time 21097 microseconds Total Request Time: 21397 microseconds image closed and deleted Server count is 27449
Full Request is iiif=/home/patomorfa/tif2/12_73_512.tiff/59392,87040,1024,1024/512,/0/default.jpg [1/1]: Command / Argument is iiif : /home/patomorfa/tif2/12_73_512.tiff/59392,87040,1024,1024/512,/0/default.jpg FIF :: Image cache hit. Number of elements: 107 FIF :: TIFF image detected FIF :: Image dimensions are 143360 x 107520 FIF :: Image contains 3 channels with 8 bits per channel FIF :: Image timestamp: Sun, 08 Jan 2017 12:01:14 GMT FIF :: Total command time 475 microseconds CVT handler reached TileManager :: Cache Miss for resolution: 8, tile: 11958 TileManager :: Cache Size: 13 tiles, 9.98872 MB Full Request is iiif=/home/patomorfa/tif2/12_73_512.tiff/57344,90112,2048,2048/512,/0/default.jpg [1/1]: Command / Argument is iiif : /home/patomorfa/tif2/12_73_512.tiff/57344,90112,2048,2048/512,/0/default.jpg FIF :: Image cache hit. Number of elements: 100 FIF :: TIFF image detected FIF :: Image dimensions are 143360 x 107520 FIF :: Image contains 3 channels with 8 bits per channel FIF :: Image timestamp: Sun, 08 Jan 2017 12:01:14 GMT FIF :: Total command time 467 microseconds CVT handler reached TileManager :: Cache Miss for resolution: 7, tile: 3108 TileManager :: Cache Size: 13 tiles, 9.98872 MB TileManager :: Tile cache insertion time: 3161 microseconds TileManager :: Total Tile Access Time: 86654 microseconds TileManager getRegion :: Tile access time 86752 microseconds for tile 11958 at resolution 8 CVT :: Error flushing jpeg data CVT :: Error writing jpeg strip data: 18776 TileManager :: Tile cache insertion time: 4472 microseconds TileManager :: Total Tile Access Time: 12310 microseconds TileManager getRegion :: Tile access time 12412 microseconds for tile 3108 at resolution 7 CVT :: Error flushing jpeg data CVT :: Error writing jpeg strip data: 21797 CVT :: Error writing jpeg strip data: 23463 CVT :: Error writing jpeg strip data: 20376 CVT :: Error writing jpeg strip data: 25524 CVT :: Error writing jpeg strip data: 20353 CVT :: Error writing jpeg strip data: 26896 CVT :: Error writing jpeg EOI markers CVT :: Total command time 28455 microseconds IIIF :: Total command time 28960 microseconds Total Request Time: 49511 microseconds image closed and deleted Server count is 27420
CVT :: Error writing jpeg strip data: 20459 CVT :: Error writing jpeg EOI markers CVT :: Total command time 107498 microseconds IIIF :: Total command time 108015 microseconds Total Request Time: 108286 microseconds image closed and deleted Server count is 27450
Caught Terminated signal. Terminating after 27450 accesses Mon Jan 16 17:50:23 2017 <----------------------------------->
Caught Terminated signal. Terminating after 27420 accesses Mon Jan 16 17:50:23 2017 <-----------------------------------> /// manual apache reboot goes here <-----------------------------------> <-----------------------------------> Mon Jan 16 18:57:51 2017
Mon Jan 16 18:57:51 2017
IIPImage Server. Version 1.0 IIPImage Server. Version 1.0 Ruven Pillay ruven@users.sourceforge.net Ruven Pillay ruven@users.sourceforge.net
Verbosity level set to 2 Verbosity level set to 2 Running in FCGI mode Running in FCGI mode
Setting maximum image cache size to 10MB
and corresponding apache log [Mon Jan 16 06:25:08.237212 2017] [core:notice] [pid 15681:tid 140688182757248] AH00094: Command line: '/usr/sbin/apache2' [Mon Jan 16 15:24:23.422421 2017] [fcgid:emerg] [pid 5807:tid 140687886477056] (35)Resource deadlock avoided: [client 213.227.98.124:48326] mod_fcgid: can't get pipe mutex, referer: http://mikroskop.sum.edu.pl/pato/index.php?szkielko_guid=000000000pato65 [Mon Jan 16 17:43:35.869091 2017] [fcgid:emerg] [pid 8254:tid 140687794157312] (35)Resource deadlock avoided: [client 213.227.98.124:36770] mod_fcgid: can't get pipe mutex, referer: http://mikroskop.sum.edu.pl/pato/index.php?szkielko_guid=CD1AAB8CC4B2467 F83297F179443A104 [Mon Jan 16 17:43:37.003936 2017] [fcgid:emerg] [pid 7726:tid 140687962011392] (35)Resource deadlock avoided: [client 213.227.98.124:38325] mod_fcgid: can't get pipe mutex, referer: http://mikroskop.sum.edu.pl/pato/index.php?szkielko_guid=CD1AAB8CC4B2467 F83297F179443A104 [Mon Jan 16 17:44:40.071253 2017] [fcgid:warn] [pid 5808:tid 140687836120832] [client 213.227.98.124:47899] mod_fcgid: can't apply process slot for /usr/lib/cgi-bin/iipsrv7.fcgi, referer: http://mikroskop.sum.edu.pl/pato/index.php?szkielko_guid=CD1AAB8CC 4B2467F83297F179443A104 [Mon Jan 16 17:44:40.071515 2017] [fcgid:warn] [pid 5808:tid 140687920047872] [client 213.227.98.124:51513] mod_fcgid: can't apply process slot for /usr/lib/cgi-bin/iipsrv7.fcgi, referer: http://mikroskop.sum.edu.pl/pato/index.php?szkielko_guid=CD1AAB8CC 4B2467F83297F179443A104 [Mon Jan 16 17:44:40.071617 2017] [fcgid:warn] [pid 5808:tid 140687878084352] [client 213.227.98.124:60242] mod_fcgid: can't apply process slot for /usr/lib/cgi-bin/iipsrv7.fcgi, referer: http://mikroskop.sum.edu.pl/pato/index.php?szkielko_guid=CD1AAB8CC 4B2467F83297F179443A104 .... till reboot [Mon Jan 16 18:57:43.279562 2017] [core:warn] [pid 15681:tid 140688182757248] AH00045: child process 10057 still did not exit, sending a SIGTERM [Mon Jan 16 18:57:45.281719 2017] [core:warn] [pid 15681:tid 140688182757248] AH00045: child process 10057 still did not exit, sending a SIGTERM [Mon Jan 16 18:57:47.283840 2017] [core:warn] [pid 15681:tid 140688182757248] AH00045: child process 10057 still did not exit, sending a SIGTERM [Mon Jan 16 18:57:49.285950 2017] [core:error] [pid 15681:tid 140688182757248] AH00046: child process 10057 still did not exit, sending a SIGKILL [Mon Jan 16 18:57:50.287077 2017] [mpm_worker:notice] [pid 15681:tid 140688182757248] AH00295: caught SIGTERM, shutting down
Try increasing the iipsrv verbosity to 10 to get more details. In your current log, there's an error saying it's unable to write to the FCGI output stream for this request:
iiif=/home/patomorfa/tif2/12_73_512.tiff/59392,87040,1024,1024/512,/0/default.jpg
Try restarting Apache and requesting this URL to see what happens.
It happened again. Full Request is iiif=/home/patomorfa/tif2/12_67_512.tiff/51200,34304,512,512/512,/0/default.jpg [1/1]: Command / Argument is iiif : /home/patomorfa/tif2/12_67_512.tiff/51200,34304,512,512/512,/0/default.jpg FIF :: Image cache hit. Number of elements: 105 FIF :: TIFF image detected FIF :: Image dimensions are 103936 x 56320 FIF :: Image contains 3 channels with 8 bits per channel FIF :: Image timestamp: Thu, 29 Dec 2016 21:33:36 GMT FIF :: Total command time 243 microseconds CVT handler reached TileManager :: Cache Miss for resolution: 8, tile: 13701 TileManager :: Cache Size: 13 tiles, 9.4967 MB TileManager :: Tile cache insertion time: 454 microseconds TileManager :: Total Tile Access Time: 6886 microseconds TileManager getRegion :: Tile access time 6983 microseconds for tile 13701 at resolution 8 CVT :: Error writing jpeg strip data: 16130 CVT :: Error writing jpeg strip data: 16456 CVT :: Error writing jpeg EOI markers CVT :: Total command time 19899 microseconds IIIF :: Total command time 20182 microseconds Total Request Time: 20384 microseconds image closed and deleted Server count is 84352
This: [Wed Jan 18 21:20:20.786112 2017] [fcgid:emerg] [pid 28084:tid 140102126708480] (35)Resource deadlock avoided: [client 213.227.98.124:39998] mod_fcgid: can't lock process table in pid 28084, referer: http://mikroskop.sum.edu.pl/pato/index.php?szkielko_guid=000000000pato93
might be related.
Both iiif=/home/patomorfa/tif2/12_67_512.tiff/51200,34304,512,512/512,/0/default.jpg iiif=/home/patomorfa/tif2/12_73_512.tiff/59392,87040,1024,1024/512,/0/default.jpg gives correct results without freeze now.
I'm afraid increasing debug level would make logs really huge. Currently they're over 200 MB daily uncompressed.
I've made a working watchdog in cron to restart apache if defunct iipsrv is found.
I dont know how to reproduce it on demand. This month it happened maybe 10 times - for 1 298 390 (so far) tiles served. I don't want to test it now as I only have production instance and now I've got high user demand, yet I need to work it out.
Good news is - defunct process only affect itself - as I use two processes ( FcgidMaxProcessesPerClass 2 ) - two cpu cores - the other one is working until it freezes independently - and that's the point the site stops working - both slots are occupied by defunct processes. Of course I can increase the number of workers but it does not sound like a good idea.
Full Request is iiif=/home/patomorfa/tif2/10_14_512.tiff/34816,26624,2048,2048/512,/0/default.jpg [1/1]: Command / Argument is iiif : /home/patomorfa/tif2/10_14_512.tiff/34816,26624,2048,2048/512,/0/default.jpg IIIF handler reached IIIF :: URL decoded to /home/patomorfa/tif2/10_14_512.tiff/34816,26624,2048,2048/512,/0/default.jpg FIF handler reached FIF :: URL decoding/filtering: /home/patomorfa/tif2/10_14_512.tiff => /home/patomorfa/tif2/10_14_512.tiff FIF :: Image cache hit. Number of elements: 115 FIF :: TIFF image detected FIF :: Created image FIF :: Image dimensions are 70916 x 142576 FIF :: Image contains 3 channels with 8 bits per channel FIF :: Image timestamp: Mon, 07 Mar 2016 20:37:51 GMT FIF :: Total command time 346 microseconds IIIF :: Requested Region: x:34816, y:26624, w:2048, h:2048 IIIF :: Requested Size: 512x512 IIIF :: Requested Rotation: 0 degrees IIIF :: Requested Quality: default with format: jpg IIIF :: image request for /home/patomorfa/tif2/10_14_512.tiff with arguments: region: 34816,26624,2048,2048; size: 512x512; rotation: 0; mirroring: 0 CVT handler reached CVT :: Using resolution 7 with size 17729x35644 CVT :: Region: 8704,6656,512,512 CVT :: Requested scaled region size is 512x512 at resolution 7. Nearest existing resolution has region with size 512x512 TileManager getRegion :: Total tiles in image: 35x70 tiles TileManager getRegion :: Tile start: 17,13 with offset: 0,0 TileManager getRegion :: Tile end: 17,13 TileManager :: Cache Miss for resolution: 7, tile: 472 TileManager :: Cache Size: 13 tiles, 9.9887 MB TileManager :: Tile cache insertion time: 378 microseconds TileManager :: Total Tile Access Time: 11805 microseconds TileManager getRegion :: Tile access time 11902 microseconds for tile 472 at resolution 7 TileManager getRegion :: destination tile width: 512, tile height: 512 CVT :: About to JPEG compress strip with height 128 Full Request is iiif=/home/patomorfa/tif2/10_14_512.tiff/34816,24576,2048,2048/512,/0/default.jpg CVT :: Compressed data strip length is 1653 [1/1]: Command / Argument is iiif : /home/patomorfa/tif2/10_14_512.tiff/34816,24576,2048,2048/512,/0/default.jpg IIIF handler reached IIIF :: URL decoded to /home/patomorfa/tif2/10_14_512.tiff/34816,24576,2048,2048/512,/0/default.jpg FIF handler reached FIF :: URL decoding/filtering: /home/patomorfa/tif2/10_14_512.tiff => /home/patomorfa/tif2/10_14_512.tiff FIF :: Image cache hit. Number of elements: 114 FIF :: TIFF image detected FIF :: Created image FIF :: Image dimensions are 70916 x 142576 FIF :: Image contains 3 channels with 8 bits per channel FIF :: Image timestamp: Mon, 07 Mar 2016 20:37:51 GMT FIF :: Total command time 550 microseconds IIIF :: Requested Region: x:34816, y:24576, w:2048, h:2048 IIIF :: Requested Size: 512x512 IIIF :: Requested Rotation: 0 degrees IIIF :: Requested Quality: default with format: jpg IIIF :: image request for /home/patomorfa/tif2/10_14_512.tiff with arguments: region: 34816,24576,2048,2048; size: 512x512; rotation: 0; mirroring: 0 CVT handler reached CVT :: Using resolution 7 with size 17729x35644 CVT :: Region: 8704,6144,512,512 CVT :: Requested scaled region size is 512x512 at resolution 7. Nearest existing resolution has region with size 512x512 TileManager getRegion :: Total tiles in image: 35x70 tiles TileManager getRegion :: Tile start: 17,12 with offset: 0,0 TileManager getRegion :: Tile end: 17,12 TileManager :: Cache Miss for resolution: 7, tile: 437 TileManager :: Cache Size: 13 tiles, 9.9887 MB CVT :: About to JPEG compress strip with height 128 CVT :: Compressed data strip length is 1024 CVT :: Error flushing jpeg data CVT :: About to JPEG compress strip with height 128 CVT :: Compressed data strip length is 1024 CVT :: About to JPEG compress strip with height 128 CVT :: Compressed data strip length is 1024 CVT :: Total command time 20175 microseconds IIIF :: Total command time 20592 microseconds Memcached :: stored 4950 bytes in 58 microseconds Total Request Time: 20761 microseconds image closed and deleted Server count is 56359
TileManager :: Tile cache insertion time: 477 microseconds TileManager :: Total Tile Access Time: 10056 microseconds TileManager getRegion :: Tile access time 10153 microseconds for tile 437 at resolution 7 TileManager getRegion :: destination tile width: 512, tile height: 512 CVT :: Error flushing jpeg data CVT :: About to JPEG compress strip with height 128 CVT :: Compressed data strip length is 1653 CVT :: About to JPEG compress strip with height 128 CVT :: Compressed data strip length is 1024 CVT :: About to JPEG compress strip with height 128 CVT :: Compressed data strip length is 1024 CVT :: About to JPEG compress strip with height 128 CVT :: Compressed data strip length is 1024 CVT :: Total command time 17168 microseconds IIIF :: Total command time 17787 microseconds Memcached :: stored 4950 bytes in 65 microseconds Total Request Time: 18513 microseconds image closed and deleted Server count is 57091
Caught Terminated signal. Terminating after 56359 accesses Mon Jan 23 23:53:15 2017 <----------------------------------->
Caught Terminated signal. Terminating after 57091 accesses Mon Jan 23 23:53:15 2017 <----------------------------------->
<-----------------------------------> <-----------------------------------> Mon Jan 23 23:55:14 2017
Mon Jan 23 23:55:14 2017
IIPImage Server. Version 1.0 IIPImage Server. Version 1.0
Here we are - error occured with debug level 10. I still don't see any important details here. Maybe the bug is in apache/fcgi module?
The only useful message is confirmation that the error occurs when outputting data to mod_fcgid:
CVT :: Error flushing jpeg data
Try adjusting the FCGI buffer size. There is the FcgidOutputBufferSize parameter which you could try increasing: https://httpd.apache.org/mod_fcgid/mod/mod_fcgid.html#fcgidoutputbuffersize
The buffer should be adequate to the size of a single tile? A few of them? Or "as much as the user can ask for before they're received by the user"?
It's 65K by default, so try doubling this and see if this helps. Double again if necessary. You could even try setting it to zero to disable buffering to see if that helps.
Okay, so:
Three more apache reboots were required,
The last one was - very similar.
Full Request is iiif=/home/patomorfa/tif2/12_89_512.tiff/24576,20480,2048,2048/512,/0/default.jpg
[1/1]: Command / Argument is iiif : /home/patomorfa/tif2/12_89_512.tiff/24576,20480,2048,2048/512,/0/default.jpg
IIIF handler reached
IIIF :: URL decoded to /home/patomorfa/tif2/12_89_512.tiff/24576,20480,2048,2048/512,/0/default.jpg
FIF handler reached
FIF :: URL decoding/filtering: /home/patomorfa/tif2/12_89_512.tiff => /home/patomorfa/tif2/12_89_512.tiff
FIF :: Image cache hit. Number of elements: 82
FIF :: TIFF image detected
FIF :: Created image
FIF :: Image dimensions are 121856 x 87040
FIF :: Image contains 3 channels with 8 bits per channel
FIF :: Image timestamp: Fri, 30 Dec 2016 16:38:37 GMT
FIF :: Total command time 405 microseconds
IIIF :: Requested Region: x:24576, y:20480, w:2048, h:2048
IIIF :: Requested Size: 512x512
IIIF :: Requested Rotation: 0 degrees
IIIF :: Requested Quality: default with format: jpg
IIIF :: image request for /home/patomorfa/tif2/12_89_512.tiff with arguments: region: 24576,20480,2048,2048; size: 512x512; ro
tation: 0; mirroring: 0
CVT handler reached
CVT :: Using resolution 6 with size 30464x21760
CVT :: Region: 6144,5120,512,512
CVT :: Requested scaled region size is 512x512 at resolution 6. Nearest existing resolution has region with size 512x512
TileManager getRegion :: Total tiles in image: 60x43 tiles
TileManager getRegion :: Tile start: 12,10 with offset: 0,0
TileManager getRegion :: Tile end: 12,10
TileManager :: Cache Miss for resolution: 6, tile: 612
TileManager :: Cache Size: 13 tiles, 9.9887 MB
TileManager :: Tile cache insertion time: 438 microseconds
TileManager :: Total Tile Access Time: 20823 microseconds
TileManager getRegion :: Tile access time 20936 microseconds for tile 612 at resolution 6
TileManager getRegion :: destination tile width: 512, tile height: 512
CVT :: Error flushing jpeg data
CVT :: About to JPEG compress strip with height 128
CVT :: Compressed data strip length is 25274
CVT :: Error writing jpeg strip data: 25274
CVT :: About to JPEG compress strip with height 128
CVT :: Compressed data strip length is 23793
CVT :: Error writing jpeg strip data: 23793
CVT :: About to JPEG compress strip with height 128
CVT :: Compressed data strip length is 22662
CVT :: Error writing jpeg strip data: 22662
CVT :: About to JPEG compress strip with height 128
CVT :: Compressed data strip length is 21351
CVT :: Error writing jpeg strip data: 21351
CVT :: Error writing jpeg EOI markers
CVT :: Total command time 33849 microseconds
IIIF :: Total command time 34337 microseconds
Memcached :: stored 93305 bytes in 243 microseconds
Total Request Time: 34730 microseconds
image closed and deleted
Server count is 21850
Caught Terminated signal. Terminating after 20574 accesses
Wed Jan 25 15:10:45 2017
<----------------------------------->
Caught Terminated signal. Terminating after 21850 accesses
Wed Jan 25 15:10:45 2017
<----------------------------------->
<----------------------------------->
<----------------------------------->
I've just increased the buffer size and keep monitoring the situation. Although it doesn't sound like fixing the root of the problem, 64 KB sounds small nowadays for any kind of buffer. I've set it to 256 KB.
I'm running into this as well when performance testing using siege.
I'm seeing segfaults of the IIP process in /var/log/messages and I think it might have to do with how the condition is handled when writing to the FCGIWriter doesn't result in sending the expected number of bytes. From what I've read, this can be due to the client terminating the connection unexpectedly. Currently, the behavior of IIP is to spit out a log message and continue on with the remainder of the FCGIWriter operations, but I'm wondering if not throwing an exception at that point is leading to an eventual segfault. I'm looking into it and will update this ticket if I find anything interesting.
Ok, the segfault was caused by failure to initialize a pointer in the enhanced version I'm working with, not failure to write the jpeg strip - I would venture to guess that's caused by the client closing the connection prematurely or the system running out of sockets. In my performance tests, when mod_rewrite is used with a proxy connection and I slam IIP with requests using the siege tool, the system is much more susceptible to running out of connections. I see nf_conntrack: table full, dropping packet in /var/log/messages pretty quickly in my tests and it takes one of the TCP connection timeout intervals to recycle it. I don't see that problem when hitting IIP directly though and now that the segfault issue is fixed, I'm able to serve about 1200 requests / second once everything is cached in memcached.
Hi,
mi iipsrv based server just hung up - permitted number of iipsrv processes got defunct and apache hasn't killed them.
My fcgi config:
31967 ? 00:02:42 iipsrv7.fcgi
31971 ? 00:02:39 iipsrv7.fcgi
From apache error.log First: mod_fcgid: ap_pass_brigade failed in handle_request_ipc Later: mod_fcgid: can't get pipe mutex, finally: mod_fcgid: can't apply process slot for
(on apache restart). child process 9261 still did not exit, sending a SIGKILL child process 31546 still did not exit, sending a SIGKILL child process 31437 still did not exit, sending a SIGKILL
and it worked. but imho apache should take care of this.
What should I do?