google / binexport

Export disassemblies into Protocol Buffers
Apache License 2.0
1.03k stars 197 forks source link

BinExport wrongly identifies some functions as IMPORTED #90

Open patacca opened 2 years ago

patacca commented 2 years ago

Consider the executable ntoskrnl.exe[1] (official version of ntoskrnl.exe) After exporting it with BinExport on IDA you will find that the function FsRtlMdlReadCompleteDevEx at address 0x14032E010 is erroneously identified as IMPORTED while in reality the function code is present in the binary (IDA shows it without problem).

After looking a little bit into the sources it seems that the problem lies in this check:

      if (function.GetBasicBlocks().empty()) {
        function.SetType(Function::TYPE_IMPORTED);
      }

It seems that BinExport doesn't extract the basic blocks for that particular function.

[1] Here you can find the executable as well with the pdb file associated

Myles1 commented 1 year ago

Any updates on this? I'm running into the same issue. A function that's correctly disassembled by IDA is showing up as IMPORTED, with no instructions, in the binexport file.

I'm working on an arm64 binary and, as far as I can tell, this happens when there is a JUMPOUT from the current function, or sometimes when the current function is just very large.

Myles1 commented 1 year ago

@cblichmann Do you have any suggestions/ideas for fixing this?

cblichmann commented 1 year ago

Oh wow, I should've noticed this much earlier. @patacca: You're right, we export this as "Imported", which is a bug. However, after a bit of debugging and not finding anything odd, I simply logged everything to stdout with

./ida64 -OBinExportAlsoLogToStdErr:TRUE -OBinExportAutoAction:BinExportBinary ~/ntoskrnl-5012647.exe.i64

This resulted in (among others):

I0726 14:14:00.924358  256587 flow_graph.cc:498] Discarding excessively large function 000000014032E010: 3202 basic blocks, 5000 edges, 14559 instructions (Limit is 5000, 5000, 20000)

So in a sense this is working as intended. IDA itself also by default refuses to show the graph view for it (> 1000 basic blocks). This is exactly hitting the kMaxFunctionEdges limit of 5000 edges (defined in flow_graph.h).

@Myles1: Can you check with a command-line similar to the above whether you also hit one of these limits? Also, how often does this occur for typical binaries you analyze?

We should probably increase the limits somewhat and also set the function type to Function::TYPE_INVALID instead.

patacca commented 1 year ago

Why do we want to enforce the limits by hardcoding the values? Isn't it better to let the users adjust those values to their specific needs?

cblichmann commented 1 year ago

Yes, we can definitely do that. It'd be awkward to use from a command-line and we'd also need to make this a BinDiff setting. I'll look into that.

cblichmann commented 1 year ago

Note: You can try this out with the artifacts that are now being built for each commit: https://github.com/google/binexport/actions/runs/5678688428

Myles1 commented 1 year ago

This happens very frequently in the binaries that I'm analyzing. Is there a workaround for keeping the excessively large cfg? I need to keep the cfg, no matter how large it might be.

cblichmann commented 1 year ago

As a temporary workaround, you can increase the limits in flow_graph.h and recompile. This should indeed be configurable, but I'm trying to figure out the best way to do this. Contenders are:

As an aside, IIRC, the Ghidra exporter extension has no hard-coded limits.

patacca commented 1 year ago

I'm voting for the IDA command line option, leaving the current default values when not specified. It keeps it consistent with the other options already present (like OBinExportAlsoLogToStdErr or OBinExportAutoAction) making it easier for people to find them. Just giving my personal unwanted opinion 😄

Myles1 commented 1 year ago

Shouldn't we set the default to a large number and allow configuration for lowering it, as opposed to the other way around? That would make the defaults as sane as possible for new people. This issue took a while for me to track down and I hope others don't have to do the same.

Edit: I'm also unable to recompile because I have IDA Home, not IDA Pro. Would it be possible to make a new release with updated values?

cblichmann commented 1 year ago

@Myles1 I just submitted such a change for review. As for new binaries, you can always find them under the Actions tab under cmake-build. E.g. this is a previous one: https://github.com/google/binexport/actions/runs/5689470178.