Thesis - Githubissues

mewmew commented 4 years ago

Really cool project Sandeep!

For reference, have you come across this paper? I think they are trying to solve a similar problem as you, so may give some inspiration and insights.

Edit2: I just noticed you were already aware of this paper (per prelim-presentation) :)

S. Kim, M. Faerevaag, M. Jung, S. Jung, D. Oh, J. Lee, and S. K. Cha, Testing intermediate representations for binary analysis, in Proceedings of the 32nd IEEE/ACM International Conference on Automated Software Engineering, pp. 353–364, IEEE Press, 2017.

Edit: You may also be interested in this paper (which happens to be written by my supervisor Roberto Guanciale, @guancio):

R. Metere, A. Lindner, and R. Guanciale, Sound transpilation from binary to machine-independent code, in Brazilian Symposium on Formal Methods, pp. 197–214, Springer, 2017.

sdasgup3 commented 4 years ago

@mewmew Thanks for sharing such pretty good reads. I was well aware of the first paper. The second one is pretty enlightening.

mewmew commented 4 years ago

@mewmew Thanks for sharing such pretty good reads. I was well aware of the first paper. The second one is pretty enlightening.

More than happy to.

From the README:

Immediate Goal

Extending the methodology to other lifters like fcd/rev.ng

How far along is this work? I'd love to stress test the generality of the approach taken by single_instruction_translation_validation as I am currently implementing a binary lifter from x86 to LLVM IR.

sdasgup3 commented 4 years ago

@mewmew Awesome!! I have not started it yet, but aiming for it. I have a decent idea of where exactly the tool needs to change to accommodate other lifters (fcd or your-lifter). Happy to discuss more (sdasgup3@illinois.edu).

The single instruction validation uses the individual liftings generated by lifter and validates X86 Vs LLVM IR by comparing the corresponding symbolic summaries. Having said that, it would be better to have a look at the LLVM IR that you generate, to begin with. That is mainly because the LLVM semantics that we are currently borrowing has some limitations on FP/vector/intrinsic support. For the general-purpose instruction, your lifter may not generate them anyways (except llvm-popcnt to compute parity flag, which the LLVM semantics do have the support for).

mewmew commented 4 years ago

@mewmew Awesome!! I have not started it yet, but aiming for it. I have a decent idea of where exactly the tool needs to change to accommodate other lifters (fcd or your-lifter). Happy to discuss more (sdasgup3@illinois.edu).

The single instruction validation uses the individual liftings generated by lifter and validates X86 Vs LLVM IR by comparing the corresponding symbolic summaries. Having said that, it would be better to have a look at the LLVM IR that you generate, to begin with. That is mainly because the LLVM semantics that we are currently borrowing has some limitations on FP/vector/intrinsic support. For the general-purpose instruction, your lifter may not generate them anyways (except llvm-popcnt to compute parity flag, which the LLVM semantics do have the support for).

Wonderful! I'd be happy to discuss this in more detail. I'll throw you an e-mail and we can continue discussions there.

Cheers, Robin

sdasgup3 / validating-binary-decompilation

Thesis #22

Immediate Goal