zeek / spicy-analyzers

Growing collection of Spicy-based protocol and file analyzers for Zeek
Other
31 stars 9 forks source link

Unify analyzers for OpenVPN #72

Open bbannier opened 3 years ago

bbannier commented 3 years ago

Currently we have a number of OpenVPN analyzers. These analyzers seem largely identical with the only major differences being what kind of traffic they analyze (TCP vs. UDP), and their authentication format.

While we cannot currently use the same analyzer for TCP and UDP (I filed zeek/zeek#3359) we should clean up the analyzers so the authentication information does not require separate analyzers (#71 adds additional formats). This should make Zeek logs more stable (new parsing capabilities do not generate previously unknown Zeek protocol analyzers) and easier to consume (no need for users to add extra logic to collect different OpenVPN analyzers into one analysis stream).

A first step would probably involve moving the Zeek-side DPD signatures into the OpenVPN grammar and then branch on that on the Spicy side.

keithjjones commented 3 years ago

The problem I had with the different auth types is that there isn't a field in the header telling you what auth type the message uses. The only way I know how to do it is try all of them and the right one will detect while the other ones won't parse correctly. As an example of OpenVPN parsing issues, Wireshark doesn't support anything but SHA1 (the default for openvpn) and will show you a malformed packet for the SHA256 test pcap in this repo:

Screen Shot 2021-07-05 at 6 02 10 PM

That hash should be bigger for this trace (32 bytes). To make matters worse, there's nothing in the header that tells you if HMAC is used at all when compared to a non-HMAC OpenVPN message. In OpenVPN's world, you have to have the server and client set up exactly the same way for it to work and I guess they don't need a field to tell these things apart. Then, you might think you can key on header length, but that's a variable with the ack array (packet ID array in the screenshot) they use too. So that was the reasoning behind the design choices - I couldn't figure out a better way around all these issues. Looking forward to learning how they can be condensed. Thanks!