wowserhq / stormjs

StormLib for Javascript, powered by Emscripten
MIT License
14 stars 6 forks source link

Bug: MPQs > 2GB fail to load #28

Closed fallenoak closed 5 years ago

fallenoak commented 5 years ago

This appears to be two issues:

Incorrect llseek behavior when seeking `>= 2 31`**

These lines in the emscripten FS layer do not play well with offsets beyond 2 ** 31 bytes.

var stream = SYSCALLS.getStreamFromFD(), offset_high = SYSCALLS.get(), offset_low = SYSCALLS.get(), result = SYSCALLS.get(), whence = SYSCALLS.get();
// NOTE: offset_high is unused - Emscripten's off_t is 32-bit
var offset = offset_low;
FS.llseek(stream, offset, whence);

offset_low is read as a signed integer out of HEAP32 by SYSCALLS.get(). As a result, the value wraps after exceeding 2,147,483,647 bytes, even though it's not surprising to seek past that limit when working with large files.

Runtime assert intersecting with incompatible types

This line in SBaseFileTable.cpp fails, despite the values matching:

assert(pHeader->BlockTableSize64 <= (pHeader->dwBlockTableSize * sizeof(TMPQBlock)));

Presumably, some combination of the types of pHeader->BlockTableSize64, pHeader->dwBlockTableSize, and sizeof(TMPQBlock) prevent the assert's equality comparison from returning true.

Auron52 commented 5 years ago

@fallenoak Does this fix help at all with other files and devices larger than 2 ** 31? I am running into a similar problem with a device representing any file above that size.

fallenoak commented 5 years ago

@Auron52 The workaround I implemented should permit seeking past 2 ** 31 bytes for anything that uses the llseek syscall (ie. ___syscall140 in Emscripten). I wasn't clear if negative offsets are permitted, so it only works for llseek in SEEK_SET mode; it won't work for anything that uses llseek in SEEK_CUR or SEEK_END mode.

https://github.com/wowserhq/stormjs/blob/9e539882d77574125e5b86933a3ee386b8f6f64b/src/binding/post.js

I believe a proper workaround would be updating Emscripten to treat off_t as 64-bits, but that probably requires a bunch of work involving BigInt on the JS side.

Auron52 commented 5 years ago

Thanks @fallenoak. Were you able to call llseek from C/C++ code? (or did you call it directly from JavaScript?) If so, how did you do it? Up to this point I have been using fseek, but I assume I need to change this.