neil-unomaha / CIF_CYBR_8950

MIT License

1 stars 0 forks source link

CIF Test Environment #20

Closed neil-unomaha closed 4 years ago

neil-unomaha commented 4 years ago

CIF Installation and Commands Notes

Installation

The installation notes specifically for the Docker installation strategy worked very well for me. I will provide a few pointers I gathered along the way

I did first have to create an account on Maxmind.

## MAXMIND Credentials
username: nthorne@unomaha.edu
Account_user ID: *****
License Key: *****

Install on Ubuntu 16.04 Server (example: Ubuntu 16.04 Desktop doesn't work)
Install the docker image via docker pull csirtgadgets/verbose-robot
As indicated in the instructions, either export your Maxmind credientals, or (as I did) put them in your .bashrc file and source it

Run your docker container:

sudo docker run -e CIF_TOKEN="${CIF_TOKEN}" -e MAXMIND_USER_ID="${MAXMIND_USER_ID}" -e MAXMIND_LICENSE_KEY="${MAXMIND_LICENSE_KEY}" -it -p 5000:5000 -d --name verbose-robot csirtgadgets/verbose-robot:latest

Remember that CIF 4 is running in a docker container. You still need to shell into the container in order to install additional software.
- Shell into your running container via:
```
sudo docker exec -it verbose-robot /bin/bash
```
- Now that you are at a command prompt inside the container, as indicated here install the CIF client via: pip install 'cifsdk>=4.0.0a0'

Now that you have the Python CIF SDK installed, you should be good to go! Be sure that all your CIF commands are run within the docker container.

Usage / Commands

The main command is cif. The client specifies a number of different options and arguments in order to get the desired response. Good Examples Here.
The process of ingesting threats/sharing threats within a network is unclear to me via the documentation
Appending to CIF threats is also unclear to me via the documentation.
The provided examples don't work because they are missing required options (apparently you must provide --tags now
```
cif --itype ipv4 --tags phishing --format table
```

The --feed option as specified here is unrecognized. Apparently that is how you pull data from feeds in CIF?

neil-unomaha commented 4 years ago

Roadblock #1 :

Getting CIF4 Environment Setup and Familiarizing ourselves with How it Works

Setup Steps Taken

Setup account on Maxmind because it is a dependency for CIF4
Installed VMware
Installed Ubuntu 16.04 server on VMWare
- (Note: Cannot run on 16.04 Desktop)
Inside 16.04 server: installed Docker
With Docker installed on the 16.04 Server: installed the CIF image from dockerhub
- Follow documentation installation steps under docker strategy
Run the container
- Command to run the container with necessary options specified in documentation
Open up a shell on the running container to query the running CIF server
- Command from documentation: sudo docker exec -it verbose-robot /bin/bash
- Step to install the CIF CLI/Client inside the container appears to be unnecessary because the image already contains it, but ran the command anyways just in case: pip install 'cifsdk>=4.0.0a0'
Specify queries against the CIF server

Issues Encountered

Documentation contains gaps in explanations (implication that reader already has significant familiarity with the topic and related technology).
Majority of code examples specified in documentation do not work as specified. Almost always require additional options
It appears we can successfully query CIF right now, but the issue is that in our test environment: there are no existing feeds, so it doesn't return anything.
- Example: cif --itype ipv4 --tags malware

Questions to be Answered

- Is CIF currently pulling feeds in real time, Or is pulling from feeds disabled by default? If it is disabled, how do we enable it?

It appears like the Rules Directory is where the feeds are specified

- How do we add additional feeds?

Currently Known CIF Resources/Tutorials/Documentation

CIF4 Server Code
CIF4 CLI Client in Python
CIF4 CLI Client in Go
CIF4 Wiki
CIF3 Wiki
CIF Maintainer's Blog
- Wes Young (appears to be the creator behind CIF4)
- Posts are not tutorials, mostly like journal entries/reflection on topics such as threat intelligence and writing open source code.
- Payment required to answer questions surrounding CIF
Presentation on CIF
- Video discussing version 2 of CIF, provides a good overview of CIF, but beyond an introduction to CIF, the processes/procedures mentioned are outdated.

Forcasted Obstacles

Our middleware will need to be in an environment where it has the necessary privileges to talk to both the Palo Alto API, and to access the CIF CLI
- We will need to work closely with Brian in order to make sure that this environment exists, or we will need to work together in order to create this environment.

neil-unomaha commented 4 years ago

Setup CIF 4 Test Environment

The following example shows how to setup a CIF test environment with Ubuntu 16.04 server running in a virtual machine

Installation

Install Ubuntu 16.04 server

First you will need to download an Ubuntu 16.04 server image and create a virtual machine.

Once you startup your virtual machine and login, there are a couple more steps for our guide. We wanted a GUI, so we installed the ubuntu-desktop extension:

sudo apt-get update
sudo apt-get install ubuntu-desktop

Install Docker

Next you need to install docker:

sudo apt install docker.io

Install CIF4 Docker Image

Install the CIF 4 container from docker hub

sudo docker pull csirtgadgets/verbose-robot

Setup Prior to running Docker Container

Before running the docker container, you need to create some environment variables.

The first is CIF_TOKEN which will contain a randomly generated string. This string ultimately becomes the bearer token passed into for all of your GET and POST requests via the request header for security. You can generate a random string with the following command on Ubuntu:

head -n 25000 /dev/urandom | openssl dgst -sha256 | awk -F ' ' '{print $2}'

An example ouptut string is the following:

525ff70def1b2b4eff3119451eabfa0ce3fa6316efb55fda075db08ac4a2feda

The other two required environment variables are MAXMIND_USER_ID AND MAXMIND_LICENSE_KEY. CIF depends on as mentioned here. Head over to Maxmind, create a free account, and within the settings you can find your account id and license key.

maxmind_key

Here is an example command to setup these environment variables. Note that you'll want to swap out the values for MAXMIND_USER_ID and MAXMIND_LICENSE_KEY Setup those environment variables.

export CIF_TOKEN=`head -n 25000 /dev/urandom | openssl dgst -sha256 | awk -F ' ' '{print $2}'`
export MAXMIND_USER_ID=201001
export MAXMIND_LICENSE_KEY=3r8ESHRiFIsF

Run CIF Container

With the environment variables all setup, you can now run your CIF docker image:

sudo docker run -e CIF_TOKEN="${CIF_TOKEN}" -e MAXMIND_USER_ID="${MAXMIND_USER_ID}" -e MAXMIND_LICENSE_KEY="${MAXMIND_LICENSE_KEY}" -it -p 5000:5000 -d --name verbose-robot csirtgadgets/verbose-robot:latest

We pass into the running docker container the three environment variables we specified above with the -e flag
We setup port forwarding on port 5000 with the -p flag
We run the docker container in a daemon with -d
For ease of referencing our docker container in the future, we labeled the container verbose-robot

To confirm our docker container is running, we can run sudo docker ps

Execute CIF Commands

In order to interact with CIF, we can do so in two ways. The command prompt or with Swagger.

Command Prompt

To do this we need to bash into our running container. We can do that with the following:

sudo docker exec -it verbose-robot /bin/bash

Now that we are inside the container, we can execute the cif command with various options in order to query the CIF database. Here are some example commands:

cif --itype ipv4 --tags scanner
cif --itype url --tags phishing
cif --itype url --tags malware
cif --itype ipv4 --tags botnet

example_output

Note that by default, CIF is pulling feeds from providers you specified every three minutes.

Swagger

On the VM running CIF you can visit http://localhost:5000 which displays a rest api gui.
swagger

It is important to note that the lock symbol next to each endpoint indicates that the a token is required to be passed in for each request. This is the string that we created and stored within the CIF_TOKEN environment variable earlier. Click the Authorize button and add the token.

Once you add the token, you should be able to interact with the api in the GUI. click the Try it Out button which toggles the endpoint, then click Execute. execute_swagger

You can then scroll down to see the response: swagger_response

Create Endpoints

The CIF file that specifies endpoints is in app.py. We think, in this docker container, the specific file is located here:

/usr/local/lib/python3.6/site-packages/verbose_robot-4.0.1-py3.6.egg/cif/httpd/app.py

Adding an endpoint should be as simple as the following:

@app.route('/')
  def hello_world():
  return 'Hello, World!'

Restart the server and try it out. Initial attempts did not work.

One possible explanation is becuase CIF is using Flask-RESTPlus, so the config might actually be this:

@api.route('/hello')
  class HelloWorld(Resource):
  def get(self):
      return {'hello': 'world'}

We will have to try this.

neil-unomaha commented 4 years ago

Create the Palo endpoint

Still need to figure out how to instead pass in the token as a parameter within the request, or remove the token requirement altogether.

# /usr/local/lib/python3.6/site-packages/verbose_robot-4.0.1-py3.6.egg/cif/httpd/app.py
# around line 39, add the following:
from .palo import api as palo_api

app_py_1

# /usr/local/lib/python3.6/site-packages/verbose_robot-4.0.1-py3.6.egg/cif/httpd/app.py
# around line 84, add the following:
palo_api,

app_py_2

# Create the following file:
# /usr/local/lib/python3.6/site-packages/verbose_robot-4.0.1-py3.6.egg/cif/httpd/palo.py

palo_py

# All of the running CIF services are handled by supervisord.  
# You likely will need to restart supervisord so that 
# the changes are read in.  To do that, run the following 
# commands which kills the supervisord process.  
# supervisord restarts automatically.
PID=`ps aux | grep supervisord | grep -v grep | awk -F ' ' '{print $2}'`
kill -HUP $PID

You can now make requests to the endpoint, but currently you are still required to pass in the token: palo_request

skyemakable commented 4 years ago

Just to make it so commands are easier to copy/paste for testing

PID=ps aux | grep supervisord | grep -v grep | awk -F ' ' '{print $2}'kill -HUP $PID (Had line break in previous comment)

curl -X GET "http://localhost:5000/palo/" -H "accept: application/json" -H "Authorization: 46508ee7d447ef4ed9666f3cc4716f0ea246fa2fb5a1254036a384d7897d"

neil-unomaha commented 4 years ago

To remove the requirement of a token being passed into the header, It should be as simple as the documentation shows: https://flask-restplus.readthedocs.io/en/stable/swagger.html#documenting-authorizations

There must be some additional step elsewhere, because it still isn't working for me...

neil-unomaha commented 4 years ago

Well, a step closer. This at least works, though it is completely short-circuiting the before_request function

# palo.py

step1

# app.py
# Notice the return statement right at the beginning.  
# That is the only way I found to make it work

step2

# request without authorization header

step3

Within request.endpoint in I tried adding /palo, palo, palo/, and just to be sure I also tried palo/pa, /palo/pa. None of those worked.

neil-unomaha commented 4 years ago

When querying CIF by multiple tags, it is smart enough to not duplicate the same IP address. The following outputs the IPv4 address, as well as the tags

cif --limit 150000 --itype ipv4 --tags scanner,bruteforce --f csv | awk -F ',' '{print $4 " " $10}'

neil-unomaha commented 4 years ago

Implemented logging. It turns out that the path for our custom endpoint happens to be palo_palo. So: we weren't inputting the proper endpoint in the whitelist.

I figured this out by, as Doctor Hale first suggested: getting logging squared away. The easiest thing I did was I created my own log file and output request.endpoint. That was what showed me that it was palo_palo.

neil-unomaha commented 4 years ago

There are three different timestamps saved per indicator. reported_at makes the most sense to sort by. And yes: this does need to be manually sorted. It comes back in different orders, and there does not appear to be an option in CIF to sort.

cif --limit 150000 --itype ipv4 --tags botnet,phishing,malware,scanner,bruteforce,darknet --f csv --columns reported_at,indicator

Output for each line looks like the following:

2020-03-25T02:40:42.012034Z,12.34.56.78

neil-unomaha commented 4 years ago

You can sort by the timestamp with the following:

sort -t, -k 1.1,1.26 <file>

neil-unomaha commented 4 years ago

CIF 5 was released 14hours ago. I downloaded Ubuntu 18.04 and attempted to deploy it via the instructions here. Unfortunately I ran into multiple errors, so it is not as easy as the directions make it sound.

Error when following "Up and Running" directions

# got to this step
docker-compose pull
ERROR: Couldn't connect to Docker daemon at http+docker://localunixsocket - is it running?

Error when following "Building Locally" directions

# got to this step
make docker-tag

(cd docker && bash tag.sh)
tag.sh: line 5: cif-router: command not found
Makefile:32: recipe for target 'docker-tag' failed
make: [docker-tag] Error 127 (ignored)

neil-unomaha commented 4 years ago

This script is located at /home/cif/palo_indicators/update_palo_indicators.sh The purpose of this file is to be executed every 10 minutes via a cron job. The IP indicators are stored in files up to a maximum of 5,000 indicators each file. NU's limit is 150,000 IP addresses. Thus, 30 files because 30 * 5,000 = 150,000 The files are located at */home/cif/paloindicators/ips.txt**

 #!/bin/bash

# EXPLAINING `cif` command options
# --limit 150,000 -> limit the returned indicators (IP addresses in this case) to 150,000
# --itype ipv4 -> return only ipv4 indicators
# --tags botnet,phishing,malware,scanner,bruteforce,darknet -> return indicators with any of the specified tags
# -f csv -> returned output to be in csv format
# --columns reported_at,indicator -> per returned indicator: only return the reported_at timestamp and indicator
# > /home/cif/palo_indicators/all_ip_indicators.txt -> output to file in indicated path

# EXPLAINING `sort` coommand
# sort by the reported_at timestamp token

# EXPLAINING `sed` command
# example output per line at this point:
#    2020-04-05T14:20:18.365410Z,12.34.56.78
# Palo Alto ingestible format is one IP address per line
# Therefore, must get rid of everything per line except for IP address
# This sed command removes everything up to and including the first comma
# Thus, leaving only the IP address per line

/usr/local/bin/cif --limit 150000 --itype ipv4 --tags botnet,phishing,malware,scanner,bruteforce,darknet --f csv --columns reported_at,indicator | sort -t, -k 1.1,1.26 | sed 's/^[^,]*,//g'  > /home/cif/palo_indicators/all_ip_indicators.txt

# Paging feature: allow maximum of 5000 IPs per file
# 5000 IPs allowed per file
# 5000 * 30 = 150,000
for num in {1..30}
do
endLine=$(($num * 5000))
startLine=$(($endLine - 4999))
endSedLine=$(($endLine + 1))
pagingSedOpts="$(($startLine)),$(($endLine))p;$(($endSedLine))q"

/bin/cat /home/cif/palo_indicators/all_ip_indicators.txt | sed -n $pagingSedOpts > /home/cif/palo_indicators/ips_$num.txt 
done

# Must change file ownership to cif user or else cif api cannot access files
# chown cif:cif /home/cif/palo_indicators/ips_*

Here is the cronjob that executes the script

*/10 * * * * /bin/bash /home/cif/palo_indicators/update_palo_indicators.sh

Script is being executed by root user
It is properly creating all the files.
The issue is that the cif command is not returning any indicators. Thus: all_ip_indicators.txt as well as ips_* files are all empty.

all_files

skyemakable commented 4 years ago

Just trial and error to see about making a curl request from the palo endpoint and receive a csv file back. From local host there is the indicators for indicator related operations where I can make a curl request there to get a csv file with logs relevant to ipv4 addresses, and a series of tags. The request looks like this:

So I tried to look for how to make a curl request from the palo.py file. I found a useful import named shlex that would allow me to run a curl command from the python file, palo.py.

From the screenshot, I use -o to save the file under /home/cif/palo_indicators/testfile.csv and the GET command to refer to the indicators python file to get the csv file. A quick cat of the testfile.csv shows it was outputted.

Within SwaggerGUI I was able to run the palo.py command successfully, and the testfile.csv file was in the indicated folder.

A quick cat of the testfile.csv shows it was outputted.

Probably need to do testing to get it to use $CIF_TOKEN rather than manually add it. From there, could look into extracting the indicators and timestamps from the csv into a txt file.

neil-unomaha commented 4 years ago

Collaborated: @skyemakable @TalonF

import json
input_file=open('palo_all_indicators.json', 'r')
output_file=open('palo_paged_indicators.txt', 'w')
json_decode=json.load(input_file)
all_indicators_dirty = []
all_indicators_clean = []
for item in json_decode:
    my_dict = {}
    my_dict['id'] =item.get('id')
    my_dict['indicator'] =item.get('indicator')
    all_indicators_dirty.append(my_dict)
all_indicators_dirty.sort(key=lambda x: x["id"])
for obj in all_indicators_dirty:
    all_indicators_clean.append(obj["indicator"])

length_of_indicators = len(all_indicators_clean)

# initialize index count based on paging
index_count = (param * 5000) - 5000
for num in range(5000):
    if(index_count > length_of_indicators - 1):
        break
    else:
        # print(all_indicators_clean[index_count])
        index_count += 1
        output_file.write(indicator)
        output_file.write("\n")
  output_file.close()

neil-unomaha commented 4 years ago


import time, shlex, subprocess
from flask_restplus import Namespace, Resource
from .constants import HTTPD_TOKEN, ROUTER_ADDR
from flask import send_file

api = Namespace('palo', description='Palo API')

@api.route('/<string:page_num>')
@api.response(401, 'Unauthorized')
@api.response(200, 'OK')
class Palo(Resource):
    @api.doc(security=[])
    def get(self, page_num):
        page_num_is_digit = False

        for character in page_num:
            if character.isdigit():
                page_num_is_digit = True
            else:
                page_num_is_digit = False
                break

        if(page_num_is_digit == False):
            return "Error: invalid page number"

        cmd = '''curl -o /home/cif/palo_indicators/palo_all_indicators.json -X GET "http://localhost:5000/indicators/?fmt=json&tags=botnet%2Cphishing%2Cmalware%2Cscanner%2Cbruteforce%2Cdarknet&itype=ipv4" -H "accept: application/json" -H "Authorization: 46508ee7d447ef4ed9666f3cc4716f0ea246fa2fb5a1254036a384d7897dbaee"'''
        args = shlex.split(cmd)
        process = subprocess.Popen(args, shell=False, stdout=subprocess.PIPE, stderr=subprocess.PIPE)
        stdout, stderr = process.communicate()

        __init_page_output(page_num)

        return send_file("/home/cif/palo_indicators/palo_paged_indicators.txt")

    def __init_page_output(page_num):
        input_file=open('palo_all_indicators.json', 'r')
        output_file=open('palo_paged_indicators.txt', 'w')
        json_decode=json.load(input_file)
        all_indicators_dirty = []
        all_indicators_clean = []
        for item in json_decode:
            my_dict = {}
            my_dict['id'] =item.get('id')
            my_dict['indicator'] =item.get('indicator')
            all_indicators_dirty.append(my_dict)

        all_indicators_dirty.sort(key=lambda x: x["id"])

        for obj in all_indicators_dirty:
            all_indicators_clean.append(obj["indicator"])

        length_of_indicators = len(all_indicators_clean)

        # initialize index count based on paging
        index_count = (int(page_num) * 5000) - 5000
        for num in range(5000):
            if(index_count > length_of_indicators - 1):
                break
            else:
                # print(all_indicators_clean[index_count])
                index_count += 1
                output_file.write(indicator)
                output_file.write("\n")
                output_file.close()

skyemakable commented 4 years ago

#from cifsdk.client.http import HTTP as Client
#from cifsdk.constants import ROUTER_ADDR, VALID_FILTERS
from flask import request, session, current_app

from .indicators import *

import time, json, os, logging, requests
from flask_restplus import Namespace, Resource
from flask import send_file
from .constants import ROUTER_ADDR

import logging
import arrow
import re
import traceback
import copy
import zmq

from flask_restplus import Namespace, Resource, fields
from flask import request, session, current_app
from cif.constants import FEEDS_LIMIT, FEEDS_WHITELIST_LIMIT, \
    HTTPD_FEED_WHITELIST_CONFIDENCE, FEEDS_WHITELIST_DAYS
from cifsdk.constants import ROUTER_ADDR, VALID_FILTERS
from cifsdk.client.zmq import ZMQ as Client
from cifsdk.exceptions import AuthError, TimeoutError, InvalidSearch, \
    SubmissionFailed, CIFBusy

from pprint import pprint

from csirtg_indicator.feed import aggregate
from csirtg_indicator.feed import process as feed
from csirtg_indicator.feed.fqdn import process as feed_fqdn
from csirtg_indicator.feed.ipv4 import process as feed_ipv4
from csirtg_indicator.feed.ipv6 import process as feed_ipv6

api = Namespace('palo', description='Palo API')

@api.route('/<string:page_num>')
@api.response(401, 'Unauthorized')
@api.response(200, 'OK')
class Palo(Resource):
    @api.doc(security=[])
    def get(self, page_num):
        # filters definition
        # <fill in the blank> - format an object defined as follows:
        # filters ['parameter'] = <parameter_value> # where parameter_value is what you are passing in from curl, p$
        # in the indicator code
        # fmt=json&tags=botnet%2Cphishing%2Cmalware%2Cscanner%2Cbruteforce%2Cdarknet&itype=ipv4"
        f = open("/home/cif/palo_debug.txt", "a")
        filters = {
                #'indicators': 'example.com',
                'tags': 'botnet,phishing,malware,scanner,bruteforce,darknet',
                'itype': 'ipv4'
                }

        f.write("This is the router address: ")
        f.write(str(ROUTER_ADDR))

         # get information from the database using the same structure used in indicators
        #cli = Client('https://localhost:5000',token=os.getenv('CIF_TOKEN'), verify_ssl=False)

        f.write("\nThis is what CLI is: ")
        #f.write(str(cli))

        #f.close()
        # result from the database is returned as an object here
        with Client(ROUTER_ADDR, os.getenv('CIF_TOKEN')) as client:
            results = client.indicators_search(filters)
            f.write(str(results))
        f.close()

Jacksonurrutia commented 4 years ago

There's a big issue with using StringIO instead of writing to a file currently. return We're currently returning the file when the rest call ends, but we close the file earlier: close This is an issue when using StringIO because when you close the file it's removed from memory, and no longer accessable. We could not close it and just return, but that's never a good idea.