tomsoftware / Settlers.ts

A Web Settlers 4 Clone/Remake in TypeScript
https://settlers.hmilch.net/
MIT License
33 stars 13 forks source link

StreamWriter will return 0xFF (-1) instead of 0 #6

Closed MuffinMario closed 2 years ago

MuffinMario commented 2 years ago

Hey,

currently there is a bug that will result in incorrect unpacking of data. In very rare cases, you will receive a (distance/length) symbol, with the distance being higher than the current pointer of the decompressed data buffer. E.g. you are writing on copyPos = 5, receiving (6,3), it will go write the values of indices {-1,0,1}, or without a boundary check the values {deflatedBuf[-1],deflatedBuf[0],deflatedBuf[1]}

In Settlers, this will result in the values out of bounds trying to be read being 0, but due to the implementation of StreamWriter.getByte(x) with x < 0 || x > data.length returning -1 (0xFF), it will not do the same as expected. Weirdly enough, this appears to be intended behaviour to read beyond the beginning of the inflated buffer.

Sadly I do not have an example at the moment, but I just wanted to let you know that this may result in unintended behavior in the future :)

Also I'd like to say that your work is amazing! I would have probably given up if it wasn't for your RE'ing of settlers DEFLATE function. I have learnt so much about compression from trying to understand your implementation and it helped me a lot! I have at this point completely reversed the entire MAP/EDM format with all its segment (chunk) contents and I will soon upload a repository if you lack any information, feel free to use that repository in return of your great effort👍

Cheers

tomsoftware commented 2 years ago

Hi Muffin,

Thanks for this hint! Not quite sure if I understand the problem :-) But I will get back to this if we find a bad-example! After I had "reimplement" the un-compressing I was googling around and finally found another implementation: https://bitbucket.org/AngryKirC/noxedit2014/src/master/Shared/Kirc/NoxLzCompression.cs do you know if this had the same problem?

Cool - at the moment I do not have much time to move this project forward but understanding the chunks is on top of my todo list. The idea is to have savegames with a time delta of one tick and then understand the logic and processing of the original game engine. Also I hope to manage drawing objects to the map this summer!

Thanks & regards Thomas

MuffinMario commented 2 years ago

Hi,

this issue can only be seen on one single map in the maps that comes with the game. But it can also be replicated with custom maps. If you open XMD3_trojan4.map and unpack segment 4, you can see what I mean. You will see this error message pop up image which tells you that you are trying to go back a distance from the inflated buffer cursor lets you land on an index behind 0. I don't precisely remember the length of it, but I think it's trying to copy the values from -1 to 3 to index 14 to 18, which you can see here image

There are multiple maps using this special case where they're indexing beyond 0 to write n 0x00 values, not just -1 but also -2 and -3 as absolute offset are possible.

Although to be honest, I cannot confirm that you always only write zeroes in that case. It could just as well be something else. But from what I've seen comparing my inflated buffer with the one inside the memory of the game, they've always been supposed to be zeros.

From that link it appears that not only settlers used that precise implementation of Lz+Huffman encoding. While searching on the internet I could not find a library using those specific parameters though, so I wonder how they both coincidentally end up the same. Or perhaps I was just overlooking something. But no, I have not tried that link out, especially since I am not too keen with TypeScript :')

I'll release my repository in the form of a project written in C++ as some kind of interface to access the map as a DLL, just as the MapAccess that is already existing in this game, just.. more things you can do with that DLL of course. I was also imagining a rather big goal at the end of the line in which I'll have written my own editor. So I'll have to learn all about the graphics, rendering and animations process in S4 too :)

Cheers

MuffinMario commented 2 years ago

After I had "reimplement" the un-compressing I was googling around and finally found another implementation: https://bitbucket.org/AngryKirC/noxedit2014/src/master/Shared/Kirc/NoxLzCompression.cs do you know if this had the same problem

I just glanced over it and it seems to be language/compiler specific if that fixes the issue. on C# I believe this issue would not arise. Instead of looking at the written buffer, this implementation looks up distance % 0x10000 bytes of the last written bytes. So in our example, instead of writer.getByte(-1), it is accessing a "history" table of written bytes where we write our last written byte at writeIndex % 10000 in our example map, we would be doing historyTable[-1 % 0x10000] which is equivalent to historyTable[0xFFFF]. Assuming in C#, a newly created array is always filled with 0, this would result in the expected behavior. But it seems that the implementation of unpack seems to not copy the right values then for distances > 0x10000. Lucky for us, a distance above 0xFFFF is not possible, because our offset cannot go higher than that. I've compared your code to calculate the offset to copy with the one from NoxLz, and while they look different, do the exact same thing. Calculating the maximum offset that you can achieve, we receive offset = readBits(9+6) + (0x40 << 9) which in worst case can be offset = 0x7FFF + 0x8000 = 0xFFFF so the only issue that arises is, if you read any offset > the amount of bytes written yet.

A very simple straightforward fix would be to just return 0 if the index of StreamWriter.getByte(x) is out of bounds, alternatively you can do it like I've done it and just create a simple if case that captures that specific case https://github.com/MuffinMario/S4-Map-Reader/blob/master/S4EditorEDMCrypt/S4Inflate.h#L239

tomsoftware commented 2 years ago

Hi MuffinMario,

thanks for your work! :-)

Unfortunately I cannot find the "XMD3_trojan4.map". It would be great if I can reproduce the problem so I can create a test-case for it. I own a "Sieder 4" and "Siedler 4 Gold" but can't find the map you named or another having this issue. => but of course writing -1 will always be a bad idea :-) May you can send me an example by mail? Use "[my-github-name]@gmx.de"

btw: I had a look at your code! Great work! E.g. I like the "landscape-types-tree". There are also some unknown fields in your structures. Did you try to read the save-games? They do have the same format as .map you just need to jump over the "MZ-File-Header" and start at offset "6656" and maybe you need to change the Chunk-Type-IDs. Of course this values are sometimes just padding to get some 2^n length but I think some of them are used in the real game (and so written to the save game) and set to 0 in the map-files to be able to have the same struct.

PS: your help is also welcome here :-)

MuffinMario commented 2 years ago

Hi,

I had to use that landscape hierarchy for my other project to generate custom random maps, but I just lazily copy-pasted all of the SGround.h into that project :). I honestly don't like it all too well, but it does the job. I am currently not interested in working on save-game files, so it's a low point of interest for me. Here is the map XMD3_trojan4.zip (i cannot upload a .map file so I have to put it into a zip here..). Feel free to check this map's segment 4 yourself and see what I mean 👍

tomsoftware commented 2 years ago

@MuffinMario great! Hopefully it's fixed now :-)

Thank you again for reporting this bug and giving a sample (=> I add a unit-test using it) and all your work!