Application Verifier stops in libpcap (wpcap.dll) included with Npcap 1.79

fyodor commented 2 weeks ago

Hi Libpcap team! @guyharris suggested we create this issue corresponding to our npcap#742. We received a customer report that Application Verifier has two stops in wpcap.dll (which is all Libpcap 1.10.4 code) included in Npcap 1.79. We're working to reproduce and debug and will update this with any further information we find. And of course we appreciate anything you're able to determine. Here's the customer report:

Recently we tried to run our application (x64) with Application Verifier. With the "Networking" checks enabled, 'wpcap.dll' causes two verifier stops (https://learn.microsoft.com/en-us/windows-hardware/drivers/devtest/application-verifier-stop-codes-networking). We would like to enable these checks during our automated test runs. It's currently not possible due to these verifier stops.

The first one is calling 'pcap_findalldevs_ex' causing "A Winsock API was called before a successful WSAStartup() or after a balancing successful WSACleanup() call was made". Application Verifier seems to be tracking these calls on a per module basis. Even if 'main()' had called 'WSAStartup' before calling 'pcap_findalldevs_ex', the stop message is still issued.

The second one happens after 'main()' had exited. When 'wpcap.dll' is unloading the Verifier issues "Illegal networking API called from DllMain".

guyharris commented 2 weeks ago

The first one is calling 'pcap_findalldevs_ex' causing "A Winsock API was called before a successful WSAStartup() or after a balancing successful WSACleanup() call was made".

Does their sample code call pcap_findalldevs_ex() because it's making a remote-capture call to find a list of devices on a remote machine, or does it just call it, rather than pcap_findalldevs(), just for the lulz? If the former, the remote-capture routine for pcap_findalldevs_ex(), calls rpcap_setup_session(), which calls rpcap_remoteact_getsock(), which calls getaddrinfo() which, I infer from the example code on Microsoft's getaddrinfo() page, requires that a WSAStartup() be done first. sock_initaddress() doesn't call anything that does a WSAStartup(), so that could be the problem.

Even if 'main()' had called 'WSAStartup' before calling 'pcap_findalldevs_ex', the stop message is still issued.

OK, that's weird; that may by somebody else spontaneously calling WSACleanup() that's not undoing an earlier WSAStartup() that they've done. I did fix a bug wherein the remote capture code, if it calls WSAStartup() and the call to WSAStartup() fails, calls WSACleanup(); that's fixed in the main branch in 87d6a6ae4a143b11293837b4d8adaaf10401d4da and in the 10.4 branch in 5b5705b5d6a0cf902f74a0ccd9886a6e08b3e923. But that's a problem only if a later WSAStartup() fails.

The second one happens after 'main()' had exited. When 'wpcap.dll' is unloading the Verifier issues "Illegal networking API called from DllMain".

That's also weird, given that the libpcap DllMain() does

BOOL WINAPI DllMain(
  HANDLE hinstDLL _U_,
  DWORD dwReason _U_,
  LPVOID lpvReserved _U_
)
{
    return (TRUE);
}

That's it. (All checks of the libpcap code were done from the https://github.com/nmap/libpcap repository, which is Npcap's fork, which I think you use to add bug fixes that aren't yet in an official libpcap release.)

guyharris commented 2 weeks ago

This whole WSAStartup()/WSACleanup() thing sounds like an ugly leftover from the Win16 days; UN*Xes never had that mess.

WSAStartup() and WSACleanup() maintain a counter, so that WSAStartup() doesn't do any work unless WinSock wasn't already initialized and WSACleanup() doesn't do any work unless this is the "last" call. Unfortunately, the work that WSAStartup() does isn't necessarily cheap, so doing WSAStartup() and WSACleanup() calls only "when necessary" may be expensive.

We'd have to:

call WSAStartup() and WSACleanup() at the beginning and end of pcap_compile();
call WSAStartup() and WSACleanup() at the beginning and end of pcap_findalldevs_ex() - or, at least, in the rpcap code for it;
call WSAStartup() when opening a remote capture device and WSACleanup() when closing a remote capture device;

which might not be too bad.

guyharris commented 2 weeks ago

Or, alternatively, call WSAStartup() the first time it's needed and don't bother calling WSACleanup(), as the resources that would be freed by "proper" cleanup are probably negligible.

dmiller-nmap commented 2 weeks ago

My research shows that the problem is calling WSACleanup() in an atexit() handler, which is run "in DllMain" for some reason. I tried replacing it with the Microsoft-specific _onexit(), but that has the same problem.

guyharris commented 2 weeks ago

My research shows that the problem is calling WSACleanup() in an atexit() handler, which is run "in DllMain" for some reason.

I think that the "in DllMain" part may be Application Verifier being confused, but, at least as I read this Microsoft Learn page about C run-time library behavior, routines added by atexit() calls in a DLL are run when the DLL is unloaded, not when the main program exits (if the routine is in the DLL, it can't be run after the DLL is unloaded).

If Application Verifier doesn't recognize that they're not literally being called from DllMain(), either this is a false positive or running anything in that context, even if it isn't literally called from DllMain(), is problematic.

If it does recognize that, it may just be reporting it in a misleading fashion.

In any case, this may mean that we can't rely on atexit() (or _onexit()) to undo any WinSock stuff in a DLL.

That may, in turn, mean we should just let WinSock do the reference counting, as per one of my earlier comments, or give up on cleaning up WinSock, as per another of my earlier comments.

However, you noted in this comment on the corresponding Npcap issue that Packet.dll stopped using WinSock routines because "We did have one issue in Packet.dll due to not calling WSACleanup().", so the second option might not work.

I hate having WinSock start up and shut down every time somebody calls pcap_findalldevs_ex() on a remote host, or compiles a filter (because that's needed for host name to address resolution), if nobody has started up WinSock for another reason, but we may be stuck with that.

dmiller-nmap commented 2 weeks ago

I ought to correct that comment: Packet.dll's problem was not calling WSAStartup() before using getaddrinfo(). I mixed up the issues in my mind before writing that out.

the-tcpdump-group / libpcap

Application Verifier stops in libpcap (wpcap.dll) included with Npcap 1.79 #1333