mozilla / MozDef

DEPRECATED - MozDef: Mozilla Enterprise Defense Platform
Mozilla Public License 2.0
2.17k stars 328 forks source link

[Docker env] Alerts not being displayed - Cron silently failing in container #1315

Closed darakian closed 5 years ago

darakian commented 5 years ago

Hey All,

I've got a box on the current master. It's generating alerts fine, but the mozdef dash board doesn't display the alerts. I've verified that the alerts are created by looking at the ES cluster and the alerts index exists and contains the expected results. Any ideas on where I should be looking for the alert display part of the equation?

This is in the docker environment and nothing pops out at me after looking at the logs for each container.

pwnbus commented 5 years ago

Hey, The alerts get propogated from Elasticsearch to Mongodb by a cron script (specifically https://github.com/mozilla/MozDef/blob/master/cron/syncAlertsToMongo.py) which then show up in the mozdef dashboard.

This script "should" be running every minute as part of the mozdef_cron container. Do you get any errors if you do: docker exec -it mozdef_cron_1 bash su mozdef /opt/mozdef/envs/mozdef/cron/syncAlertsToMongo.sh

darakian commented 5 years ago

No errors but that populated the alerts. Here's the cron entry for that script from cron_entries.txt

* * * * * /opt/mozdef/envs/mozdef/cron/syncAlertsToMongo.sh

Edit. I am getting logs of the sort

2019-06-04T22:26:10.143137785Z bash: connect: Connection refused
2019-06-04T22:26:10.143157915Z bash: /dev/tcp/elasticsearch/9200: Connection refused

But they're intermittent and

[mozdef@9daa2147d3f7 cron]$ curl http://elasticsearch:9200
{
  "name" : "ifaqEDJ",
  "cluster_name" : "elasticsearch",
  "cluster_uuid" : "NW_kvNejSYi3nUIqoD1dSg",
  "version" : {
    "number" : "5.6.14",
    "build_hash" : "f310fe9",
    "build_date" : "2018-12-05T21:20:16.416Z",
    "build_snapshot" : false,
    "lucene_version" : "6.6.1"
  },
  "tagline" : "You Know, for Search"
}

I'm also getting what look like thread panics in the ES container, but I think it's recovering. Times do seem to correlate with the cron logs

[2019-06-04T22:26:12,391][WARN ][r.suppressed             ] path: /_stats, params: {}
org.elasticsearch.cluster.block.ClusterBlockException: blocked by: [SERVICE_UNAVAILABLE/1/state not recovered / initialized];
    at org.elasticsearch.cluster.block.ClusterBlocks.globalBlockedException(ClusterBlocks.java:165) ~[elasticsearch-5.6.14.jar:5.6.14]
    at org.elasticsearch.action.admin.indices.stats.TransportIndicesStatsAction.checkGlobalBlock(TransportIndicesStatsAction.java:70) ~[elasticsearch-5.6.14.jar:5.6.14]
    at org.elasticsearch.action.admin.indices.stats.TransportIndicesStatsAction.checkGlobalBlock(TransportIndicesStatsAction.java:47) ~[elasticsearch-5.6.14.jar:5.6.14]
    at org.elasticsearch.action.support.broadcast.node.TransportBroadcastByNodeAction$AsyncAction.<init>(TransportBroadcastByNodeAction.java:256) ~[elasticsearch-5.6.14.jar:5.6.14]
    at org.elasticsearch.action.support.broadcast.node.TransportBroadcastByNodeAction.doExecute(TransportBroadcastByNodeAction.java:234) ~[elasticsearch-5.6.14.jar:5.6.14]
    at org.elasticsearch.action.support.broadcast.node.TransportBroadcastByNodeAction.doExecute(TransportBroadcastByNodeAction.java:79) ~[elasticsearch-5.6.14.jar:5.6.14]
    at org.elasticsearch.action.support.TransportAction$RequestFilterChain.proceed(TransportAction.java:170) ~[elasticsearch-5.6.14.jar:5.6.14]
    at org.elasticsearch.action.support.TransportAction.execute(TransportAction.java:142) ~[elasticsearch-5.6.14.jar:5.6.14]
    at org.elasticsearch.action.support.TransportAction.execute(TransportAction.java:84) ~[elasticsearch-5.6.14.jar:5.6.14]
    at org.elasticsearch.client.node.NodeClient.executeLocally(NodeClient.java:83) ~[elasticsearch-5.6.14.jar:5.6.14]
    at org.elasticsearch.client.node.NodeClient.doExecute(NodeClient.java:72) ~[elasticsearch-5.6.14.jar:5.6.14]
    at org.elasticsearch.client.support.AbstractClient.execute(AbstractClient.java:408) ~[elasticsearch-5.6.14.jar:5.6.14]
    at org.elasticsearch.client.support.AbstractClient$IndicesAdmin.execute(AbstractClient.java:1256) ~[elasticsearch-5.6.14.jar:5.6.14]
    at org.elasticsearch.client.support.AbstractClient$IndicesAdmin.stats(AbstractClient.java:1577) ~[elasticsearch-5.6.14.jar:5.6.14]
    at org.elasticsearch.rest.action.admin.indices.RestIndicesStatsAction.lambda$prepareRequest$17(RestIndicesStatsAction.java:145) ~[elasticsearch-5.6.14.jar:5.6.14]
    at org.elasticsearch.rest.BaseRestHandler.handleRequest(BaseRestHandler.java:80) [elasticsearch-5.6.14.jar:5.6.14]
    at org.elasticsearch.rest.RestController.dispatchRequest(RestController.java:262) [elasticsearch-5.6.14.jar:5.6.14]
    at org.elasticsearch.rest.RestController.dispatchRequest(RestController.java:200) [elasticsearch-5.6.14.jar:5.6.14]
    at org.elasticsearch.http.netty4.Netty4HttpServerTransport.dispatchRequest(Netty4HttpServerTransport.java:505) [transport-netty4-5.6.14.jar:5.6.14]
    at org.elasticsearch.http.netty4.Netty4HttpRequestHandler.channelRead0(Netty4HttpRequestHandler.java:80) [transport-netty4-5.6.14.jar:5.6.14]
    at io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChannelInboundHandler.java:105) [netty-transport-4.1.13.Final.jar:4.1.13.Final]
    at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362) [netty-transport-4.1.13.Final.jar:4.1.13.Final]
    at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348) [netty-transport-4.1.13.Final.jar:4.1.13.Final]
    at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:340) [netty-transport-4.1.13.Final.jar:4.1.13.Final]
    at org.elasticsearch.http.netty4.pipelining.HttpPipeliningHandler.channelRead(HttpPipeliningHandler.java:68) [transport-netty4-5.6.14.jar:5.6.14]
    at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362) [netty-transport-4.1.13.Final.jar:4.1.13.Final]
    at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348) [netty-transport-4.1.13.Final.jar:4.1.13.Final]
    at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:340) [netty-transport-4.1.13.Final.jar:4.1.13.Final]
    at io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:102) [netty-codec-4.1.13.Final.jar:4.1.13.Final]
    at io.netty.handler.codec.MessageToMessageCodec.channelRead(MessageToMessageCodec.java:111) [netty-codec-4.1.13.Final.jar:4.1.13.Final]
    at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362) [netty-transport-4.1.13.Final.jar:4.1.13.Final]
    at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348) [netty-transport-4.1.13.Final.jar:4.1.13.Final]
    at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:340) [netty-transport-4.1.13.Final.jar:4.1.13.Final]
    at io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:102) [netty-codec-4.1.13.Final.jar:4.1.13.Final]
    at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362) [netty-transport-4.1.13.Final.jar:4.1.13.Final]
    at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348) [netty-transport-4.1.13.Final.jar:4.1.13.Final]
    at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:340) [netty-transport-4.1.13.Final.jar:4.1.13.Final]
    at io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:102) [netty-codec-4.1.13.Final.jar:4.1.13.Final]
    at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362) [netty-transport-4.1.13.Final.jar:4.1.13.Final]
    at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348) [netty-transport-4.1.13.Final.jar:4.1.13.Final]
    at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:340) [netty-transport-4.1.13.Final.jar:4.1.13.Final]
    at io.netty.handler.codec.ByteToMessageDecoder.fireChannelRead(ByteToMessageDecoder.java:310) [netty-codec-4.1.13.Final.jar:4.1.13.Final]
    at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:284) [netty-codec-4.1.13.Final.jar:4.1.13.Final]
    at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362) [netty-transport-4.1.13.Final.jar:4.1.13.Final]
    at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348) [netty-transport-4.1.13.Final.jar:4.1.13.Final]
    at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:340) [netty-transport-4.1.13.Final.jar:4.1.13.Final]
    at io.netty.channel.ChannelInboundHandlerAdapter.channelRead(ChannelInboundHandlerAdapter.java:86) [netty-transport-4.1.13.Final.jar:4.1.13.Final]
    at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362) [netty-transport-4.1.13.Final.jar:4.1.13.Final]
    at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348) [netty-transport-4.1.13.Final.jar:4.1.13.Final]
    at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:340) [netty-transport-4.1.13.Final.jar:4.1.13.Final]
    at io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1334) [netty-transport-4.1.13.Final.jar:4.1.13.Final]
    at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362) [netty-transport-4.1.13.Final.jar:4.1.13.Final]
    at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348) [netty-transport-4.1.13.Final.jar:4.1.13.Final]
    at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:926) [netty-transport-4.1.13.Final.jar:4.1.13.Final]
    at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:134) [netty-transport-4.1.13.Final.jar:4.1.13.Final]
    at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:644) [netty-transport-4.1.13.Final.jar:4.1.13.Final]
    at io.netty.channel.nio.NioEventLoop.processSelectedKeysPlain(NioEventLoop.java:544) [netty-transport-4.1.13.Final.jar:4.1.13.Final]
    at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:498) [netty-transport-4.1.13.Final.jar:4.1.13.Final]
    at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:458) [netty-transport-4.1.13.Final.jar:4.1.13.Final]
    at io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:858) [netty-common-4.1.13.Final.jar:4.1.13.Final]
    at java.lang.Thread.run(Thread.java:748) [?:1.8.0_212]
[2019-06-04T22:26:12,487][INFO ][o.e.g.GatewayService     ] [ifaqEDJ] recovered [5] indices into cluster_state
darakian commented 5 years ago

Running the python script /opt/mozdef/envs/mozdef/cron/syncAlertsToMongo.py directly and doing some printf style debugging I'm coming to the conclusion that cron is to blame.

Edit: What exactly is bash -c 'while ! timeout 1 bash -c "echo > /dev/tcp/elasticsearch/9200";do sleep 1;done && crond -n' trying to do? Does that roughly translate to english as "Echo nothing to /dev/tcp/elasticsearch/9200 with a timeout of 1 second. Sleep for a second and start crond in the foreground"?

Edit 2: No idea what the issue is. I tried backing out gene's changes from #1158 but no dice. My hotfix for the moment is idiotic, but works

#!/bin/bash
while true
do
    sleep 60
    /opt/mozdef/envs/mozdef/cron/healthAndStatus.sh
    /opt/mozdef/envs/mozdef/cron/healthToMongo.sh
    /opt/mozdef/envs/mozdef/cron/collectAttackers.sh
    /opt/mozdef/envs/mozdef/cron/syncAlertsToMongo.sh
    /opt/mozdef/envs/mozdef/cron/eventStats.sh
done

Plus

#!/bin/bash
while true
do
    sleep 3600
    /opt/mozdef/envs/mozdef/cron/esMaint.sh
    /opt/mozdef/envs/mozdef/cron/pruneES.sh
    /opt/mozdef/envs/mozdef/cron/update_geolite_db.sh
done

@pwnbus let me know if you have any ideas and I'll give them a spin.

darakian commented 5 years ago

Update on this. The issue seems to stem from systemd being unable to run and crond relies on systemd in this OS.

[mozdef@e9bb5d12deae /]$ systemctl
Failed to get D-Bus connection: Operation not permitted
[mozdef@e9bb5d12deae /]$ exit
exit
[root@e9bb5d12deae /]# systemctl
Failed to get D-Bus connection: Operation not permitted

Docker seems to block access to systemd by default. https://serverfault.com/questions/824975/failed-to-get-d-bus-connection-operation-not-permitted

I'm looking into a fix for this. So far adding privileged mode, the cgroup volume and the cap SYS_ADMIN to cron in the docker-compose.yml file have not allowed systemd to run.

pwnbus commented 5 years ago

The logs you're seeing:

2019-06-04T22:26:10.143137785Z bash: connect: Connection refused
2019-06-04T22:26:10.143157915Z bash: /dev/tcp/elasticsearch/9200: Connection refused

are from the docker containers command value, which is basically "sleeping until elasticsearch is avaiable", and stems directly from this line -> https://github.com/mozilla/MozDef/blob/master/docker/compose/docker-compose.yml#L125

These errors can be ignored.

pwnbus commented 5 years ago

We shouldn't need systemd to work here, as the cron scripts are in fact, executing, just every minute, so I'm unsure what you're trying to accomplish here?

darakian commented 5 years ago

I figured as much after digging around. At this point my thought process is that a new deploy had an updated version of docker which blocks systemd and thus cron from within containers. This seems to be a prevalent issue with a systemd replacement script even https://github.com/gdraheim/docker-systemctl-replacement

Edit: A running assumption is that crond is using systemd. I can't get any cron tasks running and the lack of systemd lead to here.

darakian commented 5 years ago

Ok. Scratch the systemd thought. I've got cron running in the docker container on a centos 7 machine. The issue seems to be on centos6/amazon linux 1 only. I think the path of least resistance is to update my base OS. Sorry for the noise and thanks for letting me think out loud.