Closed mewmew closed 5 years ago
Rebased and ready for review.
Do we have a better way to handle dependent PRs?
Rebase after https://github.com/lapsang-boys/pippi/pull/17 and then I'll review
Rebase after #17 and then I'll review
done
Edit: officially done as per caffd2e943491c48e5dc911ca1589aec0a794da8
This PR is now functionally complete.
However, it's performance is really poor after switching to x/text/encoding
. We should check if there are any easy wins to make to gain back this performance.
Edit: update, the performance for anything but toy inputs is poor to the point of not usable. Just to make this explicit.
Now this PR is feature complete and also performant enough to be used for larger programs.
The pi-strings-go
command currently implements support for the following string encodings:
@karlek, review at your own leisure :)
Would be interesting to add a probability of being a string based on where the string was found. If it was found inside an image, it probably isn't a string (unless it's inside a malware program, than anything can be anything)
Would be interesting to add a probability of being a string based on where the string was found. If it was found inside an image, it probably isn't a string (unless it's malware, than it can be anything)
Definitely! I think we can do a lot with probability, e.g. for longer strings, histogram of characters, n-grams and then filter/hide strings that are most likely binary data, and highlight the "real" strings.
Edit: this could also be based on language detection, if Pippi finds a lot of words in German in a given binary, then assume german, and update the frequency of the histograms accordingly.
Would be interesting to add a probability of being a string based on where the string was found. If it was found inside an image, it probably isn't a string (unless it's malware, than it can be anything)
Definitely! I think we can do a lot with probability, e.g. for longer strings, histogram of characters, n-grams and then filter/hide strings that are most likely binary data, and highlight the "real" strings.
Exactly! Improve the user experience, by sorting it more cleverly. I'd love that!
Updated #19 based on this PR.
The intention with pi-strings-go is not to replace the Rust version of pi-strings but to provide multiple implementations of the same Protobuf API, such that different implementations may be evaluated and compared for performance, feature sets, etc.
Once we get a large enough set of non-trivial compoents it will be fun to evaluate Rust vs. Go in terms of performance for more intricate reversing sessions where the garbage collector may kick in high drive.
~Note: this PR depends on #17 (and #20 for test cases).~