ystero-dev / scalpel

Packet Dissection and sculpting in Rust
Other
3 stars 6 forks source link

Make PacketBuilder and Layer::encode_bytes #65

Closed aadilshabier closed 4 months ago

aadilshabier commented 5 months ago

Refer to #56 for discussion

This aims to bring packet sculpting to scalpel. This is done by extending Layer with Layer::encode_bytes(). It's usage can be seen in examples/packet_sculpting.rs

let eth_layer = Box::new(layers::ethernet::Ethernet::new());
let ipv4_layer = Box::new(layers::ipv4::IPv4::new());
let bytes = [0x12, 0x34, 0x56, 0x78, 0x90];

let builder = scalpel::builder::PacketBuilder::new()
    .stack(eth_layer)?
    .stack(ipv4_layer)?
    .stack_bytes(&bytes);

let (packet, result): (scalpel::Packet, Vec<Vec<u8>>) = builder.build().unwrap();

result is a Vec<Vec<u8>>, where each inner Vec<u8> contains the bytes of that layer in the Packet, i.e. result[0] contains the Ethernet packet.

Only Ethernet and IPV4 sculpting are currently implemented.

gabhijit commented 5 months ago
3. We may need to add a couple of methods in the `Layer` trait (eg. `demux` (takes `mut self` and `&str` and returns Self) (updating internal fields (original `self` may be generated using `default`). (Also may be `parse`  API, something that will `parse` `scapy` strings (we may visit this later.)

The demux described here is actually what is described in the stack function above.

aadilshabier commented 4 months ago

The demux described here is actually what is described in the stack function above.

I initially decided to return a Vec<Vec<u8>> instead of a Packet because, in the latter the usage would be something like this:

let builder = scalpel::builder::PacketBuilder::new()
    .stack(eth_layer)?
    .stack(ip_layer)?
    .stack_bytes(bytes);

let packet: Packet = builder.build().unwrap();
let bytes: Vec<Vec<u8>> = packet.to_bytes();

We would expect Packet::to_bytes() to be an immutable operation, and all the demuxing, field setting to be done either during Builder::stack() or Builder::build(), but this is not possible as things like packet size, checksum, etc cannot be calculated without first converting the layers to bytes in reverse order(higher layers first). This has 2 different problems:

  1. Packet created by PacketBuilder::build() is not complete.
  2. Packet::to_bytes() should be &mut self, and the Packet is only correct after to_build() has been called on it atleast once.
gabhijit commented 4 months ago

The demux described here is actually what is described in the stack function above.

I initially decided to return a Vec<Vec<u8>> instead of a Packet because, in the latter the usage would be something like this:

let builder = scalpel::builder::PacketBuilder::new()
    .stack(eth_layer)?
    .stack(ip_layer)?
    .stack_bytes(bytes);

let packet: Packet = builder.build().unwrap();
let bytes: Vec<Vec<u8>> = packet.to_bytes();

We would expect Packet::to_bytes() to be an immutable operation, and all the demuxing, field setting to be done either during Builder::stack() or Builder::build(), but this is not possible as things like packet size, checksum, etc cannot be calculated without first converting the layers to bytes in reverse order(lower layers first). This has 2 different problems:

1. `Packet` created by `PacketBuilder::build()` is not complete.

2. `Packet::to_bytes()` should be `&mut self`, and the `Packet` is only correct after `to_build()` has been called on it atleast once.

It's possible to implement demux logic in the stack layer. It's a bit tricky. This would also mean changing register_defaults per layer a bit. Let's say you are doing stack(eth) - This is fairly straight forward as it's the lowest layer, now when you do stack(ip) on that, this gets interesting. stack(ip) should do something like this pop last Layer - call that layer's demux or some function that will update the Ethertype. Since we already know we are stacking IP, it's possible.

Surely this is a bit tricky to handle but possible.

Thus when everything is stacked, we have really a ready packet with some additional metadata added, then to_bytes can use a &self without having to update packet internally, this should return a Vec<u8> and then finally fixup_csums on the packet (an internal function) that tkaes &self and this created vectors to fix the checksums (right now let's just worry about the UDP/TCP checksums. IP checksum can be handled during the stack layer itself.

stack is the function that does all the heavy lifting.

Does it make sense?

aadilshabier commented 4 months ago

It's possible to implement demux logic in the stack layer. It's a bit tricky. This would also mean changing register_defaults per layer a bit. ...

I incorporated this and what I said above and ended up with this approach, where PacketBuilder::build now returns a Result<(Packet, Vec<Vec<u8>>), Error>. To help with this, Layer implements

fn stack_and_encode(&mut self, next_layer: &[u8], info: &str) -> Result<Vec<u8>, Error>;

This takes care of the demuxing, encoding, setting checksums all in one function. One current problem with this approach is that converting an already formed Packet, with fields like packet length, checksum, protocol number, etc will be reset. Most of these can be fixed easily though, by minor changes.

aadilshabier commented 4 months ago

checksum part - for the IP header is fine, but UDP and TCP checksums are a bit involved (because they need some fields from the IP header.) Hence I believe - a final fixup_checksums would be a good idea!

Oh right. I wasn't aware that the TCP and UDP checksums also used fields from the previous headers. I've a few ideas in mind for the fixup_checksum, but they don't seem that adaptable. Are there any other checksum calculations with any quirks like this? Knowing the more common ones might help me make a generalized design.

gabhijit commented 4 months ago

checksum part - for the IP header is fine, but UDP and TCP checksums are a bit involved (because they need some fields from the IP header.) Hence I believe - a final fixup_checksums would be a good idea!

Oh right. I wasn't aware that the TCP and UDP checksums also used fields from the previous headers. I've a few ideas in mind for the fixup_checksum, but they don't seem that adaptable. Are there any other checksum calculations with any quirks like this? Knowing the more common ones might help me make a generalized design.

This link has got code for UDP checksum. My high level idea was as follows - keep certain fields in the Packet structure, that are used "outside" the layer, for example keeping the src/dst IP address in the packet field and other fields that help during csum and the fixum_csums will take a ref to the Packet structure.

aadilshabier commented 4 months ago

Currently I am a bit confused which conversations are resolved. Can you explicitly mark those? and some minor changes that I have suggested in the review?

I've marked all the relevant commits as resolved. Apologies for not doing it earlier.

I believe it's ready to be merged as a first step, but the PR is so big, it's difficult to say with confidence. So let's just minimize the changes using the suggestions I have given and take a look.

I've minimized most of the changes and made them as slim as possible, there's only one other change I want to make right now (inverse Hashmaps for sculpting), but I think those too can be implemented later.

gabhijit commented 4 months ago

I am going to open a PR to wrap sculpting inside a feature. Once ready we will not require that or make sculpting a default feature.