w3i1ong / binsim

4 stars 0 forks source link

BinSim: Binary Code Similarity Detection with Neural Networks

This repository is the official implementation of the paper "RCFG2Vec: Considering Long-Distance Dependency for Binary Code Similarity Detection".

This repository was forked from a private repository. Before uploading to GitHub, I removed some private scripts, which may lead to errors during execution. If you encounter any errors, please open an issue on GitHub, and I will address it as soon as possible.

Our codes are organized as a python package to facilitate fair comparison of different models. It currently implements several neural network models for binary code similarity detection, including:

  1. Gemini [paper] [code]
  2. SAFE [paper] [code]
  3. GraphEmbed [paper][code]
  4. jTrans [paper] [code]
  5. alpha-diff [paper] [code]
  6. RCFG2Vec [[paper]]()[code]
  7. Asteria [paper] [code]

Installation

0. System Requirements

We have tested the code on Ubuntu 22.04 LTS with Python 3.10.

Note: We have meet several problems when installing the python binding of rocksdb on other systems. Maybe compiling rocksdb from source code can solve the problem.

We use BinaryNinja and IDA pro to disassemble the binary code and extract necessary information. So before running the code, you should install them and have a valid license. Additionally, for binaryninja, you should install its python binding.

1. Install Necessary Libraries

Binsim depends on rocksdb to save training samples, so you should install it first.

sudo apt install build-essential
sudo apt-get install libsnappy-dev zlib1g-dev libbz2-dev liblz4-dev libzstd-dev libgflags-dev
sudo apt install librocksdb-dev

2. Install necessary Python packages

After installing above libraries and packages, you can install necessary python packages with the following command:

pip install -r requirements.txt

Note: The dgl package installed by the above command only supports CPU, if you want to install the GPU version, you need to use the command provided by its official website.

3. Install BinSim

pip install .

Note: We have implemented an experimental PyTorch operator for TreeLSTM and DAGGRU, which can significantly speed up the training process. If you want to use it, you have to make sure the cuda is available and the nvcc is installed.

Reproducing Experiments

We provide a guideline for reproducing the experiments in our paper. You can find it here.