mlsecproject / combine

Tool to gather Threat Intelligence indicators from publicly available sources
https://www.mlsecproject.org/
GNU General Public License v3.0
653 stars 179 forks source link

Handling of "orphan" indicators #61

Open alexcpsec opened 10 years ago

alexcpsec commented 10 years ago

Today, indicators that for some reason do not match our "IPv4" or "FQDN" validation just stay there without a type. An example:

$ cat harvest.csv | grep -v FQDN | grep -v IPv4
"entity","type","direction","source","notes","date"
"2001:41d0:8:dcd4::1","","inbound","http://www.blocklist.de/lists/apache.txt","","2014-09-04"
"2002:5f18:8f82::5f18:8f82","","inbound","http://www.blocklist.de/lists/apache.txt","","2014-09-04"
"2002:c3d3:9a9f::c3d3:9a9f","","inbound","http://www.blocklist.de/lists/apache.txt","","2014-09-04"
"2a00:1210:fffe:145::1","","inbound","http://www.blocklist.de/lists/apache.txt","","2014-09-04"
"2a00:1210:fffe:72::1","","inbound","http://www.blocklist.de/lists/apache.txt","","2014-09-04"
"2a01:238:20a:202:1000::25","","inbound","http://www.blocklist.de/lists/apache.txt","","2014-09-04"
"2a01:540:2:bd5d:d849:1e69:7736:be41","","inbound","http://www.blocklist.de/lists/apache.txt","","2014-09-04"
"2a03:7380:140:3:a90f:3bd1:d8d9:3485","","inbound","http://www.blocklist.de/lists/apache.txt","","2014-09-04"
"2a03:7380:140:3:b86c:62e8:3e0e:a0fb","","inbound","http://www.blocklist.de/lists/apache.txt","","2014-09-04"
"2a03:7380:2380:0:501b:91a5:76ff:8fa8","","inbound","http://www.blocklist.de/lists/apache.txt","","2014-09-04"
"2a03:7380:2380:0:95db:5adb:685d:a0f0","","inbound","http://www.blocklist.de/lists/apache.txt","","2014-09-04"
"2001:41d0:1:c9b2::1","","inbound","http://www.blocklist.de/lists/bots.txt","","2014-09-04"
"2a01:430:17:1::ffff:376","","inbound","http://www.blocklist.de/lists/bots.txt","","2014-09-04"
"Export","","inbound","http://virbl.org/download/virbl.dnsbl.bit.nl.txt","","2014-09-04"
"ckaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa","","outbound","http://www.nothink.org/blacklist/blacklist_malware_dns.txt","","2014-09-04"

We are not interested (for now) on IPv6 and the other stuff seem like parsing errors.

I believe we should filter out the indicators that do not match an specific type.

krmaxwell commented 10 years ago

IPv6, definitely we can just tag and ignore for now.

The Export indicator from http://virbl.org/download/virbl.dnsbl.bit.nl.txt is actually a bug.

Interestingly, http://www.nothink.org/blacklist/blacklist_malware_dns.txt actually does list ckaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa, which we can filter out obviously but it's interesting that they let some bad data through.

alexcpsec commented 10 years ago

I was thinking of just filtering out everything that the IPv4 and FQDN stuff do not recognize.

krmaxwell commented 10 years ago

For the bad data, sure, we just filter it out. IPv6 is something we should add as a future enhancement because that's eventually going to be relevant, particularly as a research question.

alexcpsec commented 10 years ago

Sure, but then it becomes a handler here when we are ready :) : https://github.com/mlsecproject/combine/blob/master/thresher.py#L9-L19

krmaxwell commented 10 years ago

Exactly. We add the proper regex now in thresher, but winnower can filter it out (more specifically, only pass types it knows about). Or maybe just have IPv6 output as an option in combine.cfg?

alexcpsec commented 10 years ago

Well, if you have a good regex for IPv6 validation, we could just add that right away.

I think the "right" answer is for combine.cfg have a "list of indicator types I want outputted" in the winnower section, which defaults at ("IPv4", "FQDN"). Ideally you should be able to override that (or select a few others only) from the command line.

krmaxwell commented 10 years ago

I think that's the right way to go. And I'll just use something from http://stackoverflow.com/questions/53497/regular-expression-that-matches-valid-ipv6-addresses ;)

krmaxwell commented 9 years ago

Curious - does anybody have a use case for consuming IPv6 indicators right now? I see a lot more of these in the feeds, though I haven't investigated them yet.

alexcpsec commented 9 years ago

I'd just drop them for now. That was my original suggestion.

krmaxwell commented 9 years ago

That is in fact what we do. Just thinking about when we should start doing something with them.