osservatoriosicurezza / Perimetro-Cibernetico-Italiano

Pipeline 0 del progetto con definizione tecnica del Perimetro Cibernetico Italiano
16 stars 1 forks source link

Dataset generation PoC and next steps (?) #3

Open fnzv opened 4 years ago

fnzv commented 4 years ago

Hi there! Since last time we were stuck on the IPv4 discussion i wanted to contribute with a small PoC in order to start a small MVP

In order to generate automatic italian IPv4 lists i relied upon RIPE APIs filtering per country (Yes, we all know that these are not all/only italian but "self declarated" IT ASNs)

Once we get our dataset parsed in multiple sources we should see how to integrate and enrich the dataset with the network scans/scrape

I messed around with the pipelines and this is the result dataset - repo

The data is scraped every night when the CICD pipeline is scheduled to run in order to scrape the raw JSON data from RIPE APIs then via the same pipeline the parsed data is pushed into the same folder (We could generate also separate repos for datasets and so on..)

Once the data is generated is publicly available to anyone to use with their favorite scanners (nmap,Zmap,masscan.. etc)

If we want to generate a public raw dataset of scanned IT assets (shodan-like) but enriched the only thing we need to do is to setup a probe and integrate the input datasets updated daily (e.g. using masscan and raw iplist + some scraper and then store the data on a public repo)

NOTE:

If i made some mistakes sorry but wrote this in a twinkling