soprasteria / cybersecurity-dfm

Data Feed Manager (news watch orchestrator to predict topic with deepdetect and store cleaned text in elasticsearch)
GNU General Public License v3.0
40 stars 14 forks source link

Data Feeds Manager


.. image:: ./analysis.png .. image:: ./explore.png

============= License

Data Feeds Manager is a service which crawl feeds, extract core text content, generate text training set for machine learning and manage score selection based on predictions.

Copyright (C) 2016 Alexandre CABROL PERALES from Sopra Steria Group.

This program is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.

This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.

You should have received a copy of the GNU General Public License along with this program. If not, see http://www.gnu.org/licenses/.

============= Description

Data Feeds Manager aim to manage Feed based on data received from them.

This service crawl Feeds to assess their content and rank them regarding topics predicted by DeepDetect_.

This service let you generate machine learning models from news to DeepDetect_.

This service use ElasticSearch as data storage, DeepDetect for content Tagging and Kibana for data visualization. TinyTinyRSS is also used as RSS Feed aggregator but could be replaced by any RSS Feed Manager service which provide un aggregated RSS feed.

Currently RSS Feeds and Twitter are supported, Reddit and Dolphin_ are planned to be supported in the futur.

============= Definition

============= Requirements

The reference platform is Ubuntu Server 16.04 LTS.

According to ElasticSearch: "Less than 8 GB tends to be counterproductive (you end up needing many, many small machines), and greater than 64 GB has problems."

DeepDetect can use NVIDIA CUDA GPU or standard x86_64 CPU. Current DFM install doesn't install this feature of DeepDetect. See more here: https://github.com/beniz/deepdetect#caffe-dependencies

DFM will crawl large amount of data from the web if you have multiple RSS Feeds or Twitter searches. A good bandwith with unlimited traffic is recommended (fiber, ...).

Minimal hardware might be:

Recommended hardware might be:

.. _ElasticSearch: https://www.elastic.co/downloads/elasticsearch .. _Kibana: https://www.elastic.co/downloads/kibana .. _DeepDetect: https://github.com/beniz/deepdetect .. _TinyTinyRSS: https://tt-rss.org/gitlab/fox/tt-rss .. _Dolphin: https://www.boonex.com/downloads .. _Twitter: https://twitter.com .. _Reddit: https://www.reddit.com/

============= Install

This installation has been tested with Ubuntu 16.04.1 LTS. Installation folder /opt/dfm Require git installed (apt-get install git). Run following commands in a terminal:: cd /opt git clone https://github.com/soprasteria/cybersecurity-dfm.git cd dfm ./install_ubuntu.sh

The install.sh will install all dependencies, build when it is required, and create account dfm to run daemons. There are 4 daemons with web protocol setup in supervisor:

When installation is done.

To setup Dashboards:

============= Other information

============= Todo List

Learn more <https://github.com/soprasteria/cybersecurity-dfm>_.