Closed alexcpsec closed 9 years ago
(not ignoring - this one requires :thought_balloon: )
Per @alexcpsec - turn inbound_urls.txt
and outbound_urls.txt
into proper config files, mapping each config name string into the URLs.
I've played a little with the config files and I've produced a poc code in a local branch to address this issue.
Basically I've added the feeds to the config file, like:
[feeds.outbound]
feed_o_label1 = feed_url1
...
...
feed_o_labelN = feed_urlN
[feeds.inbound]
feed_i_label1 = feed_url1
...
...
feed_i_labelN = feed_urlN
Then reaper.py
reads the feeds from the config file (sections feeds.outbound
and feeds.inbound
) and store the harvested results by label (ie feed_o_label1
).
Next I've improved thresher.py
too, to let it read the associated parser function from the config file too.
For example in the config the user can now define the preferred parsed function like so:
[feeds.parsers]
feed_whatever = whatever_parser
whatever_parser()
is then used to parse the result's labeled as feed_whatever
.
This behaviour should be a good starting point to implement a plugin system in which the parser's are read from other modules.
Please let me know if you like this approach, you can find the code in https://github.com/gbrindisi/combine/tree/labeled-feeds
Hopefully I'll be able to tidy up the code a bit more tomorrow.
I was thinking about this today while I was working to merge your stuff and I share your thoughts on this.
I'll have a look at your stuff tonight and comment on some other suggestions.
On Sun, Oct 12, 2014 at 11:28 AM, Gianluca Brindisi notifications@github.com wrote:
I've played a little with the config files and I've produced a poc code in a local branch to address this issue. Basically I've added the feeds to the config file, like:
[feeds.outbound] feed_o_label1 = feed_url1 ... ... feed_o_labelN = feed_urlN [feeds.inbound] feed_i_label1 = feed_url1 ... ... feed_i_labelN = feed_urlN
Then
reaper.py
reads the feeds from the config file (sectionsfeeds.outbound
andfeeds.inbound
) and store the harvested results by label (iefeed_o_label1
). Next I've improvedthresher.py
too, to let it read the associated parser function from the config file too. For example in the config the user can now define the preferred parsed function like so:[feeds.parsers] feed_whatever = whatever_parser
whatever_parser()
is then used to parse the result's labeled asfeed_whatever
. This behaviour should be a good starting point to implement a plugin system in which the parser's are read from other modules. Please let me know if you like this approach, you can find the code in https://github.com/gbrindisi/combine/tree/labeled-feedsHopefully I'll be able to tidy up the code a bit more tomorrow.
Reply to this email directly or view it on GitHub:
https://github.com/mlsecproject/combine/issues/63#issuecomment-58814879
This e-mail message and any files transmitted with it contain legally privileged, proprietary information, and/or confidential information, therefore, the recipient is hereby notified that any unauthorized dissemination, distribution or copying is strictly prohibited. If you have received this e-mail message inappropriately or accidentally, please notify the sender and delete it from your computer immediately.
I like the idea of paving the way for plugins. The feed_i_whatever
syntax feels a little ugly to me for some reason. Need to brain on this a bit.
Just to put some perspective, this is the test configuration I've used (most of the entries are commented out):
[feeds.outbound]
#malwaregroup = http://www.malwaregroup.com/ipaddresses
#malc0de = http://malc0de.com/bl/IP_Blacklist.txt
#zeustracker = https://zeustracker.abuse.ch/blocklist.php?download=ipblocklist
#spyeyetracker = https://spyeyetracker.abuse.ch/blocklist.php?download=ipblocklist
#palevotracker = https://palevotracker.abuse.ch/blocklists.php?download=ipblocklist
alienvault = http://reputation.alienvault.com/reputation.data
#nothink-malware-dns = http://www.nothink.org/blacklist/blacklist_malware_dns.txt
#nothink-malware-http = http://www.nothink.org/blacklist/blacklist_malware_http.txt
#nothink-malware-irc = http://www.nothink.org/blacklist/blacklist_malware_irc.txt
[feeds.inbound]
#projecthoneypot = http://www.projecthoneypot.org/list_of_ips.php?rss=1
[feeds.parsers]
alienvault = process_alienvault
Also I've used the feeds.XXX
naming schema for the config sections to see if it was feasible to abstract the standard inbound/outbound categorization. It's just an idea though.
I like having that confined to the categories / sections much better than having the names. But I am reminded that there are two hard things in computer science: cache invalidation, naming things, and off-by-one errors.
I've made a pull request to discuss it better: https://github.com/mlsecproject/combine/pull/86
Today, the "source" field corresponds to the URL from where the indicator was gathered from.
According to the docs (and to my opinion :P) it should be an identifying string that describes that source and that should be documented on the Wiki. It bothers me because I cannot match these sources up with the data we provided for the tiq-test samples, so it is an enhancement and a bug at the same time...
Perhaps the
thresher_map
should be the place for that or somewhere equivalent on the plugin system from #23. Is there a short term solution to this that does not require waiting for the plugin refactoring?