hobuinc / greyhound

Greyhound is a point cloud streaming server. It should be considered deprecated for now. Use Entwine and Entwine Point Tile directly if you just want to serve point cloud web services.
https://greyhound.io
Apache License 2.0
131 stars 51 forks source link

Greyhound service not responding #30

Closed smercier closed 7 years ago

smercier commented 8 years ago

I have install a Docker container for a Greyhound service on a server. Time to time the service going down without any relevant details through docker logs. The info request never respond (/resource/270_merge_3857/info) and Greyhound docker container need to be restarted to get back on track. Where should I look to have more info on this problem. Is it a Docker, Greyhound or Entwine index config problem?

connormanning commented 8 years ago

My first guess would be Greyhound. What sort of logs do you see before the failure?

Does the server launch with logs that look like this?

Sun Sep 18 2016 16:11:59 GMT+0000 (UTC) LOG Using
Sun Sep 18 2016 16:11:59 GMT+0000 (UTC) LOG     Chunk cache size: 64
Sun Sep 18 2016 16:11:59 GMT+0000 (UTC) LOG     Libuv threadpool size: 5
Sun Sep 18 2016 16:11:59 GMT+0000 (UTC) LOG Read paths: [ /* */ ]
Sun Sep 18 2016 16:12:00 GMT+0000 (UTC) LOG HTTP server running on port 80

Any logging after that once you attempt to load the resource? Also, is this data public? If so I could try to reproduce locally.

smercier commented 8 years ago

Not public data but I'll ask to customer and send you a link if needed. When I start Greyhound container, my log is:

Mon Sep 19 2016 18:35:46 GMT+0000 (UTC) LOG Using default config
Mon Sep 19 2016 18:35:47 GMT+0000 (UTC) LOG Using
Mon Sep 19 2016 18:35:47 GMT+0000 (UTC) LOG     Chunk cache size: 32
Mon Sep 19 2016 18:35:47 GMT+0000 (UTC) LOG     Libuv threadpool size: 5
Mon Sep 19 2016 18:35:47 GMT+0000 (UTC) LOG Read paths: [ '/opt/data' ]
Mon Sep 19 2016 18:35:47 GMT+0000 (UTC) LOG HTTP server running on port 80

here's log before failure

GET /resource/RDP_RMI/read?depthBegin=11&depthEnd=12&bounds=[273867.5,5045122.5,25,278745,5050000,4902.5]&schema=[{%22name%22:%22X%22,%22size%22:4,%22type%22:%22signed%22},{%22name%22:%22Y%22,%22size%22:4,%22type%22:%22signed%22},{%22name%22:%22Z%22,%22size%22:4,%22type%22:%22signed%22},{%22name%22:%22Intensity%22,%22size%22:2,%22type%22:%22unsigned%22},{%22name%22:%22Classification%22,%22size%22:1,%22type%22:%22unsigned%22},{%22name%22:%22Red%22,%22size%22:2,%22type%22:%22unsigned%22},{%22name%22:%22Green%22,%22size%22:2,%22type%22:%22unsigned%22},{%22name%22:%22Blue%22,%22size%22:2,%22type%22:%22unsigned%22}]&scale=1&offset=[288500,5050000,25]&compress=true - - ms - -
GET /resource/RDP_RMI/read?depthBegin=10&depthEnd=11&bounds=[268990,5040245,-9730,278745,5050000,25]&schema=[{%22name%22:%22X%22,%22size%22:4,%22type%22:%22signed%22},{%22name%22:%22Y%22,%22size%22:4,%22type%22:%22signed%22},{%22name%22:%22Z%22,%22size%22:4,%22type%22:%22signed%22},{%22name%22:%22Intensity%22,%22size%22:2,%22type%22:%22unsigned%22},{%22name%22:%22Classification%22,%22size%22:1,%22type%22:%22unsigned%22},{%22name%22:%22Red%22,%22size%22:2,%22type%22:%22unsigned%22},{%22name%22:%22Green%22,%22size%22:2,%22type%22:%22unsigned%22},{%22name%22:%22Blue%22,%22size%22:2,%22type%22:%22unsigned%22}]&scale=1&offset=[288500,5050000,25]&compress=true - - ms - -
GET /resource/RDP_RMI/read?depthBegin=11&depthEnd=12&bounds=[273867.5,5045122.5,25,278745,5050000,4902.5]&schema=[{%22name%22:%22X%22,%22size%22:4,%22type%22:%22signed%22},{%22name%22:%22Y%22,%22size%22:4,%22type%22:%22signed%22},{%22name%22:%22Z%22,%22size%22:4,%22type%22:%22signed%22},{%22name%22:%22Intensity%22,%22size%22:2,%22type%22:%22unsigned%22},{%22name%22:%22Classification%22,%22size%22:1,%22type%22:%22unsigned%22},{%22name%22:%22Red%22,%22size%22:2,%22type%22:%22unsigned%22},{%22name%22:%22Green%22,%22size%22:2,%22type%22:%22unsigned%22},{%22name%22:%22Blue%22,%22size%22:2,%22type%22:%22unsigned%22}]&scale=1&offset=[288500,5050000,25]&compress=true - - ms - -
GET /resource/RDP_RMI/read?depthBegin=10&depthEnd=11&bounds=[268990,5040245,-9730,278745,5050000,25]&schema=[{%22name%22:%22X%22,%22size%22:4,%22type%22:%22signed%22},{%22name%22:%22Y%22,%22size%22:4,%22type%22:%22signed%22},{%22name%22:%22Z%22,%22size%22:4,%22type%22:%22signed%22},{%22name%22:%22Intensity%22,%22size%22:2,%22type%22:%22unsigned%22},{%22name%22:%22Classification%22,%22size%22:1,%22type%22:%22unsigned%22},{%22name%22:%22Red%22,%22size%22:2,%22type%22:%22unsigned%22},{%22name%22:%22Green%22,%22size%22:2,%22type%22:%22unsigned%22},{%22name%22:%22Blue%22,%22size%22:2,%22type%22:%22unsigned%22}]&scale=1&offset=[288500,5050000,25]&compress=true - - ms - -
GET /resource/RDP_RMI/info - - ms - -
smercier commented 8 years ago

Hi Connor,

I have some news on this issue. The RDP_RMI resource was transfert on server for publishing with an rsync command before while an other process was running a entwine build command (on the same laptop). So the entwine index was corrupted or uncompleted and Greyhound crash on a reading query. I'm not sure if customer was trying to read resource while he was copying entwine index on server (can be another failure cause)

connormanning commented 8 years ago

Thanks for the update. Were you able to reach the resource successfully with Greyhound after the full entwine index was loaded?

Of course, either way, crashing is not the appropriate response from Greyhound. I may make a separate issue to more robustly handle some error cases within Greyhound to ensure appropriate responses for cases like this. Hopefully you are able to continue with your workflow in the meantime.

smercier commented 8 years ago

Yes after restarting Greyhound container, we were able to open it in the viewer.

Here's the log of an other failure this morning.

GET /resource/270_merge_3857/read?depthBegin=10&depthEnd=11&bounds=[-8230817,5693279,-1358.5,-8229381.5,5694714.5,77]&schema=[{%22name%22:%22X%22,%22size%22:4,%22type%22:%22signed%22},{%22name%22:%22Y%22,%22size%22:4,%22type%22:%22signed%22},{%22name%22:%22Z%22,%22size%22:4,%22type%22:%22signed%22},{%22name%22:%22Intensity%22,%22size%22:2,%22type%22:%22unsigned%22},{%22name%22:%22Classification%22,%22size%22:1,%22type%22:%22unsigned%22},{%22name%22:%22Red%22,%22size%22:2,%22type%22:%22unsigned%22},{%22name%22:%22Green%22,%22size%22:2,%22type%22:%22unsigned%22},{%22name%22:%22Blue%22,%22size%22:2,%22type%22:%22unsigned%22}]&scale=0.1&offset=[-8230817,5696150,77]&compress=true - - ms - -
GET /resource/270_merge_3857/read?depthBegin=10&depthEnd=11&bounds=[-8230817,5694714.5,-1358.5,-8229381.5,5696150,77]&schema=[{%22name%22:%22X%22,%22size%22:4,%22type%22:%22signed%22},{%22name%22:%22Y%22,%22size%22:4,%22type%22:%22signed%22},{%22name%22:%22Z%22,%22size%22:4,%22type%22:%22signed%22},{%22name%22:%22Intensity%22,%22size%22:2,%22type%22:%22unsigned%22},{%22name%22:%22Classification%22,%22size%22:1,%22type%22:%22unsigned%22},{%22name%22:%22Red%22,%22size%22:2,%22type%22:%22unsigned%22},{%22name%22:%22Green%22,%22size%22:2,%22type%22:%22unsigned%22},{%22name%22:%22Blue%22,%22size%22:2,%22type%22:%22unsigned%22}]&scale=0.1&offset=[-8230817,5696150,77]&compress=true - - ms - -
GET /resource/270_merge_3857/read?depthBegin=10&depthEnd=11&bounds=[-8232252.5,5693279,-1358.5,-8230817,5694714.5,77]&schema=[{%22name%22:%22X%22,%22size%22:4,%22type%22:%22signed%22},{%22name%22:%22Y%22,%22size%22:4,%22type%22:%22signed%22},{%22name%22:%22Z%22,%22size%22:4,%22type%22:%22signed%22},{%22name%22:%22Intensity%22,%22size%22:2,%22type%22:%22unsigned%22},{%22name%22:%22Classification%22,%22size%22:1,%22type%22:%22unsigned%22},{%22name%22:%22Red%22,%22size%22:2,%22type%22:%22unsigned%22},{%22name%22:%22Green%22,%22size%22:2,%22type%22:%22unsigned%22},{%22name%22:%22Blue%22,%22size%22:2,%22type%22:%22unsigned%22}]&scale=0.1&offset=[-8230817,5696150,77]&compress=true - - ms - -
        Active size: 4  Idle size: 28   This query: 4
Tue Sep 20 2016 13:27:29 GMT+0000 (UTC) LOG controller::read
Tue Sep 20 2016 13:27:29 GMT+0000 (UTC) LOG controller::read
Tue Sep 20 2016 13:27:29 GMT+0000 (UTC) LOG controller::info
GET /resource/270_merge_3857/info 200 10961.990 ms - 1804
GET /resource/270_merge_3857/hierarchy?bounds=[-8233688,5693279,-2794,-8227946,5699021,2948]&depthBegin=8&depthEnd=14 - - ms - -
GET /resource/270_merge_3857/info - - ms - -
GET /resource/270_merge_3857/info - - ms - -
GET /axis2 - - ms - -

Download laz here :http://dl.mapgears.com/270_merge.laz

Create the entwine index: docker run -it -v /opt/data/greyhound/:/data connormanning/entwine entwine build -r EPSG:2950 EPSG:3857 -i /data/270_merge.laz -o /data/270_merge_3857

We using Potree entwine as viewer: http://potree.entwine.io/data/custom.html?s=sigma.cmm.qc.ca:8080/&r=270_merge_3857

We running Greyhound through docker on Ubuntu 14.04.4 LTS.

connormanning commented 8 years ago

Thanks for the data - I've got it built locally and have been browsing around. So far I've been unable to replicate the issue in Potree. From those logs, my guess is that your Greyhound container may be running low on memory, causing the requests to hang. The - - in the log indicates that the request never receives a response, but did not crash or cause an explicit error.

Have you checked the output of free -g within your container while that error is occurring? Do you have some information on the instance on which Greyhound is running? How much memory is available? Have you enabled any swap space?

One quick thing to try might be lowering Greyhound's chunk cache size. You can do this by creating a configuration file that looks like this and changing the chunkCacheSize to 24 (for example). Then:

# Assume your config file is located at ~/greyhound-config.js
docker run -it -v `pwd`:/opt/home connormanning/greyhound "bash -c \"cp /opt/home/greyhound-config.json /var/greyhound/config.json && greyhound dockerstart && greyhound log\""
smercier commented 8 years ago

Hi

after a greyhound container update, I try to set chunkCacheSize to 24 (greyhound crash ). I create a config file (/var/greyhound/config.json) but when I restart container, change is not apply. If I connect to greyhound container and I try greyhound -c /var/greyhound/config.json or greyhound dockerstart I have this loop error:

Error: listen EADDRINUSE :::8080
    at Object.exports._errnoException (util.js:907:11)
    at exports._exceptionWithHostPort (util.js:930:20)
    at Server._listen2 (net.js:1250:14)
    at listen (net.js:1286:10)
    at Server.listen (net.js:1382:5)
    at HttpHandler.start (/usr/lib/node_modules/greyhound-server/src/interfaces/http/index.js:61:36)
    at /usr/lib/node_modules/greyhound-server/src/app.js:70:21
    at nextTickCallbackWith0Args (node.js:420:9)
    at process._tickCallback (node.js:349:13)
    at Function.Module.runMain (module.js:443:11)
    at startup (node.js:139:18)
    at node.js:974:3
Relaunching greyhound
Mon Oct 17 2016 03:12:54 GMT+0000 (UTC) LOG Using default config
Mon Oct 17 2016 03:12:54 GMT+0000 (UTC) LOG Using
Mon Oct 17 2016 03:12:54 GMT+0000 (UTC) LOG     Chunk cache size: 32
Mon Oct 17 2016 03:12:54 GMT+0000 (UTC) LOG     Libuv threadpool size: 5
Mon Oct 17 2016 03:12:54 GMT+0000 (UTC) LOG Read paths: [ '/opt/data', '~/greyhound', 'http://greyhound.io' ]
Mon Oct 17 2016 03:12:54 GMT+0000 (UTC) LOG HTTP server running on port 8080
events.js:141
      throw er; // Unhandled 'error' event
...
...

How can I restart container with custom config file?

connormanning commented 8 years ago

Personally I'd just restart the container completely pointing at your new configuration. If you want to use the same container, in f09de45 I've added a name to the monitoring task (re-pull docker to get it), so you can kill greyhound with something like kill -9 $(pidof greyhound-monitor).

By the way, launching with a custom config is now simpler than my previous comment, just use -c with no copying necessary and no more dockerstart command: docker run -it -v ~:/opt/home connormanning/greyhound greyhound -c /opt/home/greyhound-config.json

smercier commented 8 years ago

I try it with the new -c option. It seems to work but Chunk cache size: undefined in log. How can I know if greyhound really use my custom chunksize or it's a bug?

root@demo-preview5:/opt/data/greyhound# docker run -d -v /opt/data/greyhound:/opt/data --name "greyhound" -p 8080:8080 --restart unless-stopped connormanning/greyhound  greyhound -c /opt/data/greyhound.config.json
root@demo-preview5:/opt/data/greyhound# docker logs greyhound
Tue Oct 18 2016 13:45:29 GMT+0000 (UTC) LOG Using config at /opt/data/greyhound.config.json
Tue Oct 18 2016 13:45:29 GMT+0000 (UTC) LOG Using
Tue Oct 18 2016 13:45:29 GMT+0000 (UTC) LOG     Chunk cache size: undefined
Tue Oct 18 2016 13:45:29 GMT+0000 (UTC) LOG     Libuv threadpool size: 5
Tue Oct 18 2016 13:45:29 GMT+0000 (UTC) LOG Read paths: [ '/opt/data', '~/greyhound', 'http://greyhound.io' ]
Tue Oct 18 2016 13:45:29 GMT+0000 (UTC) LOG HTTP server running on port 8080
root@demo-preview5:/opt/data/greyhound# cat greyhound.config.json 
{
    "queryLimits": { "chunksPerQuery": 16, "chunkCacheSize": 24 },
    "paths": ["/opt/data", "~/greyhound", "http://greyhound.io"],
    "resourceTimeoutMinutes": 30,
    "http": {
        "port": 8080,
        "enableStaticServe": true,
        "headers": {
            "Cache-Control":                  "public, max-age=300",
            "Access-Control-Allow-Origin":    "*",
            "Access-Control-Allow-Methods":   "GET,PUT,POST,DELETE"
        }
    }
}
connormanning commented 8 years ago

Ah, sorry about that. In this wave of simplifications I've been doing I tweaked that configuration format, but I haven't yet pushed the new documentation. It should work if you remove the queryLimits key, and make it look like this:

{
    "chunksPerQuery": 16,
    "chunkCacheSize": 24,
    "paths": ...
}

Updated docs will be out soon.