Incorrect 3.3.5 SMSG_ACCOUNT_DATA_TIMES parsing

kerhong commented 11 months ago

For 3.3.5 SMSG_ACCOUNT_DATA_TIMES does not survive parse->encode round trip.

SMSG_ACCOUNT_DATA_TIMES::read_inner reads an an extra u32 (data_decompressed_size) before reading data. https://github.com/gtker/wow_messages/blob/f05ce8ceb6dbba6e92b42bb0ae9ddf8444f0fbaf/wow_world_messages/src/world/wrath/smsg_account_data_times.rs#L43-L56

The .wowm and also SMSG_ACCOUNT_DATA_TIMES::write_into_vec do not have this u32 https://github.com/gtker/wow_messages/blob/f05ce8ceb6dbba6e92b42bb0ae9ddf8444f0fbaf/wow_message_parser/wowm/world/login_logout/smsg_account_data_times.wowm#L45-L55

https://github.com/gtker/wow_messages/blob/f05ce8ceb6dbba6e92b42bb0ae9ddf8444f0fbaf/wow_world_messages/src/world/wrath/smsg_account_data_times.rs#L135-L140

I think data_decompressed_size is incorrect, as I can't find it in other implementations and CacheMask::PerCharacterCache has 5 bits set, so data should have 5 elements (see comment for data in .wowm).

Here is a failing test case for this bug:

#[test]
fn reencode_smsg_account_data_times_char_mask() {
    use std::io::Cursor;
    use wow_world_messages::wrath::{
        expect_server_message, ServerMessage, SMSG_ACCOUNT_DATA_TIMES,
    };

    let original: Vec<u8> = vec![
        0x00, 0x1f, 0x09, 0x02, // packet header
        0xf3, 0xb3, 0x16, 0x65, // unix_time
        0x01, // unknown 1
        0xea, 0x00, 0x00, 0x00, // mask (5 bits set, so data should have 5 elements)
        0xe3, 0xaf, 0x16, 0x65, // data?, currently skipped as data_decompressed_size
        0x00, 0x00, 0x00, 0x00, // data
        0x00, 0x00, 0x00, 0x00, // data
        0x00, 0x00, 0x00, 0x00, // data
        0x4a, 0xa0, 0x15, 0x65, // data
    ];

    let msg = {
        let mut cursor = Cursor::new(&original);
        expect_server_message::<SMSG_ACCOUNT_DATA_TIMES, _>(&mut cursor).unwrap()
    };

    let reencoded = {
        let mut buf = Vec::new();
        msg.write_unencrypted_server(&mut buf).unwrap();
        buf
    };

    // err: original is 4 bytes longe than reencoded
    assert_eq!(original, reencoded);

    let reparsed = {
        let mut cursor = Cursor::new(&reencoded);
        expect_server_message::<SMSG_ACCOUNT_DATA_TIMES, _>(&mut cursor).unwrap()
    };

    // err: original.data.len() is 4 vs 3 for reparsed
    assert_eq!(msg, reparsed);
}

kerhong commented 11 months ago

Is it correct to assume that the eventual goal is that all packets should survive parse->encode->parse without changes (ignoring header encryption)?

gtker commented 11 months ago

Thank you for the very thorough bug report. I have pushed a fix to master that passes your test case.

In order to create unit tests for the individual messages you can write a test wowm object. It can be auto generated from a specific message through the (to_test_case_string)[https://github.com/gtker/wow_messages/blob/009948f4caa0fcac8fe85d2c7b19f51dedd7a72c/wow_world_messages/src/world/wrath/smsg_account_data_times.rs#L77] function. If you could obtain a valid message in this format a test would be auto generated that would prevent any regressions, as well as better coverage for downstream implementations in other languages.

Is it correct to assume that the eventual goal is that all packets should survive parse->encode->parse without changes (ignoring header encryption)?

Yes, absolutely. This bug was because I incorrectly added a compressed_data field to endless arrays (u8[-]) because these are usually compressed. I removed the field for non-compressed members.

Are you working on wrath-rs or your own server?

kerhong commented 11 months ago

@gtker I'm currently working on a toy project (not public yet) which for now is just a proxy for authserver + worldserver, later also a client implementation.

It's grown from just using wow_messages to try and generate code intermediate_representation.json myself (as I find wow_messages codegen to be very confusing). Is there a reason you didn't use quote?

Will try to write .wowm test cases for future issues I find

gtker commented 11 months ago

It's grown from just using wow_messages to try and generate code intermediate_representation.json

Are you using the intermediate_representation.json file? I've been assumed that I was the only one using it for wow_messages_python so I've been doing breaking changes rather freely.

The idea is for users of intermediate_representation.json to have to do as little actual calculation/thinking as possible. For example by inlining full struct types to avoid having to do any lookup.

Do you have any ideas for improvements?

(as I find wow_messages codegen to be very confusing).

Yeah, I understand completely. The layout and general idea of wow_messages has changed many times, and there are even sections where I'm not sure I even follow the logic any more because of how many times the requirements have changed.

Is there a reason you didn't use quote?

The line based output that I have makes it significantly easier to debug and track changes. This is also why I don't run rustfmt on the output before committing.

kerhong commented 11 months ago

Are you using the intermediate_representation.json file? I've been assumed that I was the only one using it for wow_messages_python so I've been doing breaking changes rather freely.

No issues as of yet, and I use both intermediate_representation.json and intermediate_representation_schema.json (which I transform to rust types using https://github.com/jsontypedef/json-typedef-codegen/), so I'd notice any changes.

JSON with a schema that is processed and simplified already is much easier to hack around than parsing wowm.

Do you have any ideas for improvements?

I am interested in this project, but currently only in the .wowm/intermediate representation part. I hope (probably presumptuously) that I can write a "better" rust implementation from the input data you maintain.

I will be opening PRs about contents of .wowm and wowm->IR translation, but probably not about rest of codegen. That being said- since I am working on generating code form same rules I am noticing possible issues about codegen.

Should I open all kinds of weird issues (and possible issues) I manage to find even if I have no intention on fixing them?

Some things I already have noted:

Fuzzing cannot be easily performed currently due to 2 issues:

There is no access to writing a header (write_unencrypted_server)

I can't find a way to read ServerOpcodeMessage and get "total bytes read" after it's read, how about exposing changing ServerOpcodeMessage::read_opcodes?

// current
//  - private
//  - caller can't see how r was changed
fn read_opcodes(opcode: u16, body_size: u32, mut r: &[u8]) -> Result<...>

// new
//  - public
//  - takes `&mut &[u8]` so caller can see how much was read
pub fn read_opcodes(opcode: u16, body_size: u32, r: &mut &[u8]) -> Result<...>

The generated code makes rust_analyzer work slowly, I think some generated code could be optimized (for parsing, not for execution), not sure where to even start
There are some types in .wowm (for example "spell") that don't seem to be documented and are just silently treated as u32, the intermediate representation doesn't even contain this type (it's just u32)
read_sized_c_string_to_vec is wrong (vec![0u8; user provided u32] is a crash exploit)
- option a: since max packet size is 0x7fffff, then that should be max valid size for read_sized_c_string_to_vec
- option b: since read_sized_c_string_to_vec is utf-8 valid (so can't have 0x00 in the middle), then u32 length can just be thrown out and it can be read as null terminated string
current bool implementation is not parse->encode->parse stable (as it reads any non-zero value as true)
- if wow actually allows bools outside of [0, 1], then bool as a type is wrong and should be u8
- if wow always sends and receives [0, 1], then bool should fail parsing on other values
- I suspect bool must be replaced by u8, as there could be servers out there that send other values for bool even if wow client itself doesn't
Some tag_all usage is confusing to me. For example cmsg_cancel_cast.wowm has #tag_all versions "1 2 3"; and then test marked as versions = "1.12";. Is the correct interpretation that tests are:
- 1 2 3 1.12 - this seems how it works currnetly for wowm->intermediate, can I suggest it should be an error if item explicitly has a version that tag_all already covers.
- 1.12

If you agree- I'll convert these (and more) into issues, just wanted to check in before I start spamming you.

gtker commented 11 months ago

I am interested in this project, but currently only in the .wowm/intermediate representation part. I hope (probably presumptuously) that I can write a "better" rust implementation from the input data you maintain.

You probably can, at least you can probably better optimize it for your use case.

Should I open all kinds of weird issues (and possible issues) I manage to find even if I have no intention on fixing them?

Yes, definitely.

There is no access to writing a header (write_unencrypted_server)

I deliberately wanted to hide as many unnecessary details as possible for users, which is why I also don't expose the opcode numbers.

Do you want to be able to only write a header and no body?

I can't find a way to read ServerOpcodeMessage and get "total bytes read" after it's read, how about exposing changing ServerOpcodeMessage::read_opcodes?

This is also not exposed by design. :)

I probably don't mind changing the function signature of read_opcodes, but I don't want it public since users really should only be able to read full messages, without having to care about the exact wire format.

The generated code makes rust_analyzer work slowly,

Oh yeah. I'm also not sure how to fix it. I'm thinking it might have something to do with the match statements that match 400+ cases that all do basically the same thing.

I may be able to do some optimizations, but no promises since no matter how you slice it there's 300k+ lines just in wow_world_messages alone.

There are some types in .wowm (for example "spell") that don't seem to be documented and are just silently treated as u32, the intermediate representation doesn't even contain this type (it's just u32)

Yeah, the intention was always there to have a dedicated Spell/Item type that wasn't just a u32/u16, but it got a little lost along the way.

read_sized_c_string_to_vec is wrong (vec![0u8; user provided u32] is a crash exploit)

That's definitely true. This is also something that I just haven't gotten around to because it's only used by clients. I would like to maintain the optimization of being told how long the string is, but there should definitely be a maximum on it.

current bool implementation is not parse->encode->parse stable (as it reads any non-zero value as true)

I think this is one of those things where we'll never get a "perfect" solution just different tradeoffs.

For wow_world_messages/wow_login_messages I want to make the library as ergonomic as possible for users, which is why I don't want to expose a u8 when the only possible options are negative/positive. This is complicated by the fact that C/C++ treat any positive integer as true and the WoW message format almost just being structs dumped on the wire knowing that the client has the same internal representation.

I think it's a good idea to have an issue for this, if only to document a decision. It might make sense to have bools with a value other than 0/1 be a panic just to figure out what the behavior actually is.

Some tag_all usage is confusing to me. [..]

This is just straight up an error on my part.

I don't have any real exact guidelines for how to do things, but I think I'm currently of the opinion that tests should be marked only with the exact version that they were tested with.

can I suggest it should be an error if item explicitly has a version that tag_all already covers.

Yes, from how I currently feel about it, I think it should be.

If you agree- I'll convert these (and more) into issues, just wanted to check in before I start spamming you.

Please do. I can't promise a speedy resolution since I'm working on/off on this, but having tracking issues for all these will definitely be good.

I also want to mention the awesome-wow-rust repo which lists most of the WoW Rust projects, as well as the discord for it. You should add your project to the repo when it's public.

gtker commented 11 months ago

There are some types in .wowm (for example "spell") that don't seem to be documented and are just silently treated as u32, the intermediate representation doesn't even contain this type (it's just u32)

This is now correctly passed to the IR, but not used in wow_messages yet.

gtker commented 11 months ago

Incorrect issue.

kerhong commented 11 months ago

The original issue was fixed.

I created issues from comments

80
81
82
83
84
85

I think this can be closed

gtker / wow_messages

Incorrect 3.3.5 SMSG_ACCOUNT_DATA_TIMES parsing #77

80

81

82

83

84

85