secdev / scapy

Scapy: the Python-based interactive packet manipulation program & library.
https://scapy.net
GNU General Public License v2.0
10.75k stars 2.03k forks source link

PacketField removes payload / dissects as raw #950

Closed mlgiraud closed 6 years ago

mlgiraud commented 6 years ago

I'm currently implementing the OPC UA Binary protocol and due to its structure decided to model e.g. ByteStrings as Packets (a ByteString has a length field and a variable length data field that consists of UaBytes). This works so far but my problem now is that the PacketField doesn't work like I originally expected it to. For example

class IntPacket(Packet):
    fields_desc = [IntField("myValue", 0)]

class TestPacket(Packet):
    name = "TestPacket"
    fields_desc = [IntField("length", 0),
                   PacketField("intPkt", IntPacket(), IntPacket),
                   IntField("alwaysHere", 0)]

test = TestPacket(b"\x00\x00\x00\x01\x11\x11\x11\x11\xff\xff\xff\xff")
test.show2()

produces

[ TestPacket ]

length = 1 \intPkt \ |###[ IntPacket ]### | myValue = 286331153 |###[ Raw ]### | load = b'\xff\xff\xff\xff' alwaysHere= 0

I tried overloading the guess_payload_class method like this:

def guess_payload_class(self, payload):
        return NoPayload

but that only removes the ###[ Raw ]### part and the alwaysHere field still isn't dissected correctly.

Is there some way to avoid this without having to write my own modified version of PacketField? Is there maybe a better way to handle the nested structure of the OPC UA protocol in scapy?

My plan currently was to just model all the base types (like UaInt32 (This needs a separate type. The normal IntField doesn't work, since UA is in little endian)) as fields and make all complex types that use these basic types Packets with a fields_desc and cascade everything.

Some additional info: I'd also like to use a structure like i described above, since the standard is quite big but it's possible to generate nearly all of it from an xml schema, once the basic structure is implemented. (around 40-50 built in types need to be coded by hand + the networking related packet structures (about 10), rest from xml)

Some more examples: If i use the above definitions (with the modified guess method) and create a packet with them like this:

test = TestPacket()
test.intPkt.myValue = 42
test.length = 1
test.alwaysHere = 31

test.show()
test.show2()

this produces the following output:

[ TestPacket ]

length = 1 \intPkt \ |###[ IntPacket ]### | myValue = 42 alwaysHere= 31

[ TestPacket ]

length = 1 \intPkt \ |###[ IntPacket ]### | myValue = 42 alwaysHere= 0

The first is what I'd expect the output to be. The second happens because the PacketField consumes all the bytes in the string before the dissect function on the alwaysHere field is called. The bytestring that is generated from the example is as expected. It's only the dissection that "fails" / works unexpectedly

A potential fix shouldn't be too difficult I guess, but before i start "fixing" I'd like somebody to confirm that this behavior is indeed not intended. If it is intended, i'd like to discuss a new field, that makes accumulation of fields possible like i described above (if it doesn't exist already, maybe I've overlooked it until now)

mlgiraud commented 6 years ago

Ok i found that returning the Padding class in guess_payload gives me the result i want. Is this the way to go, or am i missing out on some cool way i havn't discovered yet?

gpotter2 commented 6 years ago

It's useally the way it's done in scapy :)

def guess_payload_class(self, payload):
        return conf.padding_layer
mlgiraud commented 6 years ago

I made a class that defines the layer like you wrote and just let every "Packet" inherit from it. Works like a charm.

I have another question though. UA Binary messages can be encrypted with asymmetric or with symmetric algorithms. Depending on that, the header is a different one: Structure of a UA Binary Message: MessageHeader <-- always the same for secureconversation messages SecurityHeader <-- this can change SequenceHeader <-- always the same Payload Signature/Padding

If one header is present the other cannot be present. I tried modeling this with two conditional fields that have the same name, but this doesn't really work, since one of them overwrites the decoded header of the other with None. Is there a way to have such an either or field, or should i just use two different field names?

PS: I'm doing this as part of my Bachelor's thesis but would be happy to contribute the protocol definitions and such if you're interested

gpotter2 commented 6 years ago

So you need to modify the SecurityHeader depending on the MessageHeader ? I haven't quite understood how your code is supposed to detect which SecurityHeader it needs to use.

A good option would be to create two layers: SecurityHeaderAsymetric and SecurityHeaderSymetric, where one of them would inherit from the other. Then you may use the dispatch_hook function, to select while building which of the two packets needs to be used. A good example may be found here:

https://github.com/secdev/scapy/blob/master/scapy/layers/l2.py#L163-L168

    @classmethod
    def dispatch_hook(cls, _pkt=None, *args, **kargs):
        if _pkt and len(_pkt) >= 14:
            if struct.unpack("!H", _pkt[12:14])[0] <= 1500:
                return Dot3
        return cls

As a side note, i would say that all contributions are welcomed ! It would be greate if you could then submit a PR to add your layer in the contrib folder !

mlgiraud commented 6 years ago

I tried implementing it with a dispatch_hook but ran into the problem that the dispatch hook only gets passed the currently remaining data as a string when it is called.

My current structure looks something like this:

class UaSecureConversationMessageHeader(UaTypePacket):
    fields_desc = [UaBytesField("MessageType", None, 3),
                   UaByteField("IsFinal", b'F'),
                   UaUInt32Field("MessageSize", None),
                   UaUInt32Field("SecureChannelId", 0)]
class UaSecureConversation(Packet):
    fields_desc = [PacketField("MessageHeader", UaSecureConversationMessageHeader(), UaSecureConversationMessageHeader),
                   PacketField("SecurityHeader", UaAsymmetricAlgorithmSecurityHeader(), UaAsymmetricAlgorithmSecurityHeader),
                   PacketField("SequenceHeader", UaSequenceHeader(), UaSequenceHeader),
                   PacketField("Payload", Raw(), Raw)]
    # Post build stuff omitted

UaAsymmetricAlgorithmSecurityHeader has the dispatch_hook function and it is called with all the data that comes after the MessageHeader. But since i need to check the first three bytes of the MessageHeader this doesn't really help. The Type of the SecurityHeader depends on the MessageType field in the MessageHeader.

Although this also solves a different other problem that i had, so that's one step forward

gpotter2 commented 6 years ago

You're doing it the wrong way.

You should not be using a such class, which as several packet fields.

Instead, you should be using the bind_layers command, to link the layers together when certain conditions are met ( have a look at layers in the layers/ folder)

That way you will be able to select the type of a layer depending on the previous one. Your post_build function will then be able to use, if needed, self.underlayer or self.payload to retrieve previous or next layer

guedou commented 6 years ago

I am closing this issue as @gpotter2 provided a correct answer.

mlgiraud commented 6 years ago

I considered using separate layers but that only really works at the top level of the message structure. It is possible to have infinite recursion in the message structure with for example the Variant datatype.

A Variant looks something like this (somewhat pseudocode):

class Variant(Packet):
    fields_desc = [EncodingMask,  # What type is encoded? Is it an array, is it multidimensional?
                          ArrayLength,
                          data,  # Can be a single value, or an array of values depending on EncodingMask
                          ArrayDimensions]

This seemed to be solveable with the current scapy api (at least for me) by modeling each complex data type (like ByteStrings, XMLElements) as a packet, and then using PacketFields. Note, that a Variant can contains an array of Variants that in turn can again contain any data type. In the above example binding a Variant "layer" to a variant when the appropriate id is set wouldn't work, because the data is in the middle. I could maybe make the ArrayDimensions a separate layer and use it like a padding but that doesn't represent the structure of the protocol well imho. Like i explained above OPC UA is a hierarchical protocol where it is a possibility to have infinite recursion.

In my earlier comment i defined the payload as raw, but that's just an abstraction. The payload actually consists of a so called NodeId and the actual message after that. The NodeId can reference any structure defined in the OPC UA standard (or even by users) except for the basic message structures like described in my previos comment (MessageHeader, SecurityHeader, SequenceHeader and Footer) So yes, i can model those as separate "layers" and bind them.

I could just do it like i want to since i only need it for my bachelors, but i'd like to integrate it into scapy later since such a protocol module afaik doesn't exist yet and could maybe be useful for a few people. Considering that i thought it would be best to get feedback to the structure early on, so i don't do double the work.

Do you think this approach is a good idea and would be intuitive to use for someone already familiar with scapy?

I can also throw together some examples and push my progress so far to my fork, so my train of thought becomes clearer.