oalieno / asm2vec-pytorch

Unofficial implementation of asm2vec using pytorch ( with GPU acceleration )
MIT License
74 stars 21 forks source link

Missing Functions? #9

Closed MGYN closed 3 years ago

MGYN commented 3 years ago

I am trying the tool to get all functions embedding in a binary file, and the file has 10 thousand functions, but the tool can only get a few of them and missed about 9000 functions. I found that many functions in the symbol table are missed and many offsets are different from the idea pro, seems the function results in bin2asm.py are not a real function.

The difference as follow:

1616078208(1) 1616078281(1)

The missed functions in the symbol table showed in IDA as follow:

1616078734(1) 1616078702(1)

    functions, tokens_new = asm2vec.utils.load_data(data)
    for f in functions:
        print(f.meta)

Why does this happen? And Is there a solution? Besides, my goal is to get all functions embedding in a file and get all the embedding in a tensor or a NumPy with unknown numbers, but maybe I can only get the specific numbers once. I don't know much about the torch, so what can I do? Such as get the first three but I can't get the 0 to function number.

v1 = model.to('cpu').embeddings_f(torch.tensor([0,2,3]))
print(v1)
MGYN commented 3 years ago

Sorry, I made a mistake with the function miss. But I also wonder to know how can I get all the embedding in a tensor or a NumPy rather than get the specific numbers once. Thanks!

oalieno commented 3 years ago

If you want to get all the embedding you can try this.

v1 = model.to('cpu').embeddings_f.weight.clone()
print(v1)