joxeankoret / pigaios

A tool for matching and diffing source codes directly against binaries.
GNU General Public License v3.0
634 stars 67 forks source link

Port to python 3.10 and IDA 7.7 #36

Closed plowsec closed 2 years ago

plowsec commented 2 years ago

Hey @joxeankoret

A friend told me about pigaios after I considered implementing something similar. Despite the 4 years since the last commit, I was surprised to see that pigaios almost worked out-of-the box!

I managed to install the old Clang version 5.0 using a Ubuntu 18.04 so that the "-export" features could work right away. The IDA part though needed an update, so I got my hands dirty and tried to keep the code as close as possible to what you did originally.

Two potential issues though:

I had to update this snippet but I'm not sure what you were trying to do in the first place, so I can't test it.

    - ti = GetTinfo(f)
    - if ti:
    -   prototype2 = idc_print_type(ti[0],ti[1], func_name, PRTYPE_1LINE)
    + rv = get_local_tinfo(f)
    + prototype2 = ""
    + if rv is not None:
    + (typei, fields) = rv
    + if typei:
    +  prototype2 = idc_print_type(typei,fields, func_name, PRTYPE_1LINE)

The pickle file was created with a dependency to an old sklearn version so I recreated it by re-training the model with:

$ cp ../datasets/dataset.csv.bz2 .                                                                                                                                                                                                                                                 
$ bzip2 -d dataset.csv.bz2                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    
$ ls -lsah                                                                                                                                                                                                                                                                         
total 105M
 512 drwxrwxr-x 1 501 dialout  256 Jul 11 17:16 .
1.0K drwxrwxr-x 1 501 dialout  672 Jul 11 15:20 ..
1.2M -rw-r--r-- 1 501 dialout 1.2M Jul 11 11:23 clf.pkl
104M -rw-r--r-- 1 501 dialout 104M Jul 11 17:15 dataset.csv
   0 -rw-r--r-- 1 501 dialout    0 Jul 11 11:23 __init__.py
7.5K -rwxr-xr-x 1 501 dialout 7.2K Jul 11 11:23 pigaios_create_dataset.py
 13K -rwxr-xr-x 1 501 dialout  13K Jul 11 17:06 pigaios_ml.py
 512 drwxr-xr-x 1 501 dialout  128 Jul 11 17:08 __pycache__

$ python3 ./pigaios_ml.py -multi -t
[Mon Jul 11 17:42:24 2022] Using the Pigaios Multi Classifier                                                                                                                                                                                                                                                              
[Mon Jul 11 17:42:24 2022] Loading data...                                                                                                                                                                                                                                                                                 
[Mon Jul 11 17:42:35 2022] Fitting data with CPigaiosMultiClassifier(None)...                                                                                                                                                                                                                                              
Fitting DecisionTreeClassifier()                                                                                                                                                                                                                                                                                           
[Mon Jul 11 17:42:53 2022] Predicting...                                                                                                                                                                                                                                                                                   
[Mon Jul 11 17:44:04 2022] Correctly predicted 13983 out of 19075 (false negatives 5092 -> 26.694626%, false positives 441 -> 0.044100%)                                                                                                                                                                                   
[Mon Jul 11 17:44:04 2022] Total right matches 1013542 -> 99.457057%                                                                                                                                                                                                                                                       
[Mon Jul 11 17:44:04 2022] Saving model...   

I followed the instructions here https://github.com/joxeankoret/pigaios/issues/19 but I expected to find information in the output about other additional classifiers besides the Decision Tree one.

So, there you go. It's a really nice project btw!

joxeankoret commented 2 years ago

Sorry, I missed this message. I will review the PR and merge it. Thank you very-very much!

plowsec commented 2 years ago

Thx!