oalieno / asm2vec-pytorch

Unofficial implementation of asm2vec using pytorch ( with GPU acceleration )
MIT License
74 stars 21 forks source link

Error in the process of training the model #15

Closed AimiP02 closed 1 year ago

AimiP02 commented 1 year ago

Hello oalieno, I'm reproducing a paper which used Asm2Vec. I used Ghidra to extract function features and used the method of bin2asm.py to normalize assembly code.

The origin code

#include <stdio.h>

int main() {
    printf("Hello World!\n");
    return 0;
}

Before normalization (just one of the results)

.name main
.offset 00101139
.file a.out
PUSH RBP
MOV RBP,RSP
LEA RAX,[0x102004]
MOV RDI,RAX
CALL 0x00101030
MOV EAX,0x0
POP RBP
RET

After normalization

.name main
.offset 00101139
.file a.out
PUSH RBP
MOV RBP,RSP
LEA RAX,[CONST]
MOV RDI,RAX
CALL CONST
MOV EAX,CONST
POP RBP
RET

And then I want to run python scripts/train.py -i asm/ -o model.pt, but an error has occurred

Traceback (most recent call last):
  File "scripts/train.py", line 52, in <module>
    cli()
  File "/home/bronya/.conda/envs/py3.7/lib/python3.7/site-packages/click/core.py", line 829, in __call__
    return self.main(*args, **kwargs)
  File "/home/bronya/.conda/envs/py3.7/lib/python3.7/site-packages/click/core.py", line 782, in main
    rv = self.invoke(ctx)
  File "/home/bronya/.conda/envs/py3.7/lib/python3.7/site-packages/click/core.py", line 1066, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/home/bronya/.conda/envs/py3.7/lib/python3.7/site-packages/click/core.py", line 610, in invoke
    return callback(*args, **kwargs)
  File "scripts/train.py", line 48, in cli
    learning_rate=lr
  File "/home/bronya/.conda/envs/py3.7/lib/python3.7/site-packages/asm2vec-1.0.0-py3.7.egg/asm2vec/utils.py", line 74, in train
  File "/home/bronya/.conda/envs/py3.7/lib/python3.7/site-packages/asm2vec-1.0.0-py3.7.egg/asm2vec/utils.py", line 43, in preprocess
  File "/home/bronya/.conda/envs/py3.7/lib/python3.7/site-packages/asm2vec-1.0.0-py3.7.egg/asm2vec/datatype.py", line 115, in random_walk
  File "/home/bronya/.conda/envs/py3.7/lib/python3.7/site-packages/asm2vec-1.0.0-py3.7.egg/asm2vec/datatype.py", line 115, in <listcomp>
  File "/home/bronya/.conda/envs/py3.7/lib/python3.7/site-packages/asm2vec-1.0.0-py3.7.egg/asm2vec/datatype.py", line 117, in _random_walk
IndexError: list index out of range

Is this only available with radare2 to extract function features? How should I use this model with other decompilers? Thanks very much.