yonsei-sslab / XBA

A deep learning tool for generating platform-agnostic binary code embeddings
https://sites.google.com/view/xba-intro
21 stars 1 forks source link

Could you release the corresponding binary file that used for generating /data/done #1

Open Brubbish opened 1 year ago

Brubbish commented 1 year ago

Hi, I'm a bit confused about the content in the disasm.json in the subfolder in /data/done/, which I thought should be the content of strings, function names, and disassembly codes in basic blocks. But I found that, in the files, some keys don't have corresponding value, and also some values that seem to be the source files' path which should be in the debug section(though not used in gcn-relation.json).

Take /data/done/curl/disasm.json for an example, it has

"6": [], "7": ["jmp", "ptr8", "SSL_CTX_set_srp_username"], "8": ["SSL_CTX_set_srp_username"], "9": [],
....
"77467": ["../lib/warnless.c"], "77468": ["((((("], "77469": ["memdebug.c"]

I wonder what information you extracted and also the data structure of the file. Also, it would be nice if you could release the binary file thus I may solve other problems if encountered.

Plus, in the README, it says that disasm.json is used for generating BOW, but in the build_vocab.py, it uses disasm_innereye.json to build vocabulary. Did I get it wrong?

cxxz16 commented 1 year ago

yes, i also hope that...Simply providing a dataset cannot be reappear from scratch

Brubbish commented 11 months ago

@cxxz16 I have reimplemented the data-extracting code. our group (old g1) was quite busy recently, I may release it after the validation test, or if you interest in working on that together