[x] I have reviewed the OFRAK contributor guide and attest that this pull request is in accordance with it.
One sentence summary of this PR (This should go in the CHANGELOG!)
Unpack the ASCII strings in program sections as resources.
Link to Related Issue(s)
We have some string analyzer and string modifier, but strings are not represented as resources so it's hard to do analysis and queries around them.
Please describe the changes in your request.
An AsciiString view and AsciiStringUnpacker component targeting program sections. Other targets could be added as sensible, but I thought restricting it to sections of executables that get loaded into memory was a reasonably useful constraint.
I used a regex expression to find strings rather than the strings utility. Two reasons for this:
Using strings required flushing to disk, I found this caused linux resource issues when running a big unpack_recursively (too many files)
Avoid a dependency on strings which may not be installed on all systems (important as we step up Windows support). The existing StringsAnalyzer could also be changed to use this regex to avoid this dependency in the code base entirely - it actually does not already declare it's dependency on strings...
The actual process of searching for strings is pretty simple to expressing in regex, so I don't think we lose confidence by not using strings.
Anyone you think should look at this, specifically?
@marczalik could maybe help adding tests to this. We should make sure that expected strings are unpacked from a binary. The string length restrictions should also be exercised by this test - i.e. a short string in non-code section should be found, a short string in code section should not be found, and a long string in either one should be found.
One sentence summary of this PR (This should go in the CHANGELOG!) Unpack the ASCII strings in program sections as resources.
Link to Related Issue(s) We have some string analyzer and string modifier, but strings are not represented as resources so it's hard to do analysis and queries around them.
Please describe the changes in your request. An AsciiString view and AsciiStringUnpacker component targeting program sections. Other targets could be added as sensible, but I thought restricting it to sections of executables that get loaded into memory was a reasonably useful constraint.
I used a regex expression to find strings rather than the
strings
utility. Two reasons for this:strings
required flushing to disk, I found this caused linux resource issues when running a bigunpack_recursively
(too many files)strings
which may not be installed on all systems (important as we step up Windows support). The existingStringsAnalyzer
could also be changed to use this regex to avoid this dependency in the code base entirely - it actually does not already declare it's dependency onstrings
...The actual process of searching for strings is pretty simple to expressing in regex, so I don't think we lose confidence by not using
strings
.Anyone you think should look at this, specifically? @marczalik could maybe help adding tests to this. We should make sure that expected strings are unpacked from a binary. The string length restrictions should also be exercised by this test - i.e. a short string in non-code section should be found, a short string in code section should not be found, and a long string in either one should be found.