Open micahsnyder opened 4 years ago
Fixed:
Updated the PR to get the unit test working. I thought they'd worked before, but perhaps I lost a code fix switching between Mac & Windows.
I've also:
inline
as neededFILE_OFFSET_BITS
to 64
as neededSTDC_HEADERS
and WORDS_BIGENDIAN
to the config.h file (accidentally missing before)Remaining issues:
The CMake tooling currently does not define _LARGEFILE_SOURCE
/ _LARGEFILE64_SOURCE
or _LARGE_FILES
. I'm uncertain if these need to be defined. I found that the openjpeg project has feature to define them if needed but I haven't determined if we should borrow it from them: https://github.com/uclouvain/openjpeg/blob/master/cmake/TestLargeFiles.cmake
While iconv detection will be easy, I don't yet know of a way to determine the value of ICONV_CONST
which will be required for cabextract.
I have the unit tests passing on Mac, but the cabd & chmd tests currently fail on Windows. Not sure why. Any insights would be helpful.
C:\Users\micah\workspace\libmspack\libmspack\build [cmake-tooling ≡ +2 ~0 -0 !]> ctest -C Release -VV --output-on-failure
UpdateCTestConfiguration from :C:/Users/micah/workspace/libmspack/libmspack/build/DartConfiguration.tcl
UpdateCTestConfiguration from :C:/Users/micah/workspace/libmspack/libmspack/build/DartConfiguration.tcl
Test project C:/Users/micah/workspace/libmspack/libmspack/build
Constructing a list of tests
Done constructing a list of tests
Updating test list for fixtures
Added 0 tests to meet fixture requirements
Checking test dependency graph...
Checking test dependency graph end
test 1
Start 1: cabd_test
1: Test command: C:\Users\micah\workspace\libmspack\libmspack\build\test\Release\cabd_test.exe
1: Test timeout computed to be: 10000000
1: <unknown>:44 FAILED cab = cabd->open(cabd, TESTFILE("normal_2files_1folder.cab"))
1/3 Test #1: cabd_test ........................***Failed 0.02 sec
<unknown>:44 FAILED cab = cabd->open(cabd, TESTFILE("normal_2files_1folder.cab"))
test 2
Start 2: chmd_test
2: Test command: C:\Users\micah\workspace\libmspack\libmspack\build\test\Release\chmd_test.exe
2: Test timeout computed to be: 10000000
2: C:/Users/micah/workspace/libmspack/libmspack/test/test_files/chmd/blank-filenames.chm: invalid section number '58'.
2: <unknown>:80 FAILED chm1 = chmd->open(chmd, files[i])
2/3 Test #2: chmd_test ........................***Failed 0.02 sec
C:/Users/micah/workspace/libmspack/libmspack/test/test_files/chmd/blank-filenames.chm: invalid section number '58'.
<unknown>:80 FAILED chm1 = chmd->open(chmd, files[i])
test 3
Start 3: kwajd_test
3: Test command: C:\Users\micah\workspace\libmspack\libmspack\build\test\Release\kwajd_test.exe
3: Test timeout computed to be: 10000000
3: ALL 102 TESTS PASSED.
3/3 Test #3: kwajd_test ....................... Passed 0.02 sec
33% tests passed, 2 tests failed out of 3
Total Test time (real) = 0.09 sec
The following tests FAILED:
1 - cabd_test (Failed)
2 - chmd_test (Failed)
Errors while running CTest
Other notes:
I'd like to add a Github workflow to build & test on a few different systems and architectures. Maybe after the above concerns are resolved and the cabextract tooling has been added.
Update:
The initial test cabd_test
failure on Windows was a silly copypaste issue. I've resolved this and the <unknown>
line number issue, but now see a different failure for the cabd_test
.
C:\Users\micah\workspace\libmspack\libmspack\build [cmake-tooling ≡ +2 ~0 -0 !]> ctest -C Debug -VV --output-on-failure
UpdateCTestConfiguration from :C:/Users/micah/workspace/libmspack/libmspack/build/DartConfiguration.tcl
UpdateCTestConfiguration from :C:/Users/micah/workspace/libmspack/libmspack/build/DartConfiguration.tcl
Test project C:/Users/micah/workspace/libmspack/libmspack/build
Constructing a list of tests
Done constructing a list of tests
Updating test list for fixtures
Added 0 tests to meet fixture requirements
Checking test dependency graph...
Checking test dependency graph end
test 1
Start 1: cabd_test
1: Test command: C:\Users\micah\workspace\libmspack\libmspack\build\test\Debug\cabd_test.exe
1: Test timeout computed to be: 10000000
1: C:/Users/micah/workspace/libmspack/libmspack/test/test_files/cabd/bad_nofolders.cab: no folders in cabinet.
1: C:/Users/micah/workspace/libmspack/libmspack/test/test_files/cabd/bad_nofiles.cab: no files in cabinet.
1: cabd_extract_test_01:347 FAILED err == MSPACK_ERR_DATAFORMAT || err == MSPACK_ERR_DECRUNCH
1/3 Test #1: cabd_test ........................***Failed 0.01 sec
C:/Users/micah/workspace/libmspack/libmspack/test/test_files/cabd/bad_nofolders.cab: no folders in cabinet.
C:/Users/micah/workspace/libmspack/libmspack/test/test_files/cabd/bad_nofiles.cab: no files in cabinet.
cabd_extract_test_01:347 FAILED err == MSPACK_ERR_DATAFORMAT || err == MSPACK_ERR_DECRUNCH
test 2
Start 2: chmd_test
2: Test command: C:\Users\micah\workspace\libmspack\libmspack\build\test\Debug\chmd_test.exe
2: Test timeout computed to be: 10000000
2: C:/Users/micah/workspace/libmspack/libmspack/test/test_files/chmd/blank-filenames.chm: invalid section number '58'.
2: C:/Users/micah/workspace/libmspack/libmspack/test/test_files/chmd/cve-2015-4468-namelen-bounds.chm: library not compiled to support large files.
2: chmd_search_test_01:80 FAILED chm1 = chmd->open(chmd, files[i])
2/3 Test #2: chmd_test ........................***Failed 0.01 sec
C:/Users/micah/workspace/libmspack/libmspack/test/test_files/chmd/blank-filenames.chm: invalid section number '58'.
C:/Users/micah/workspace/libmspack/libmspack/test/test_files/chmd/cve-2015-4468-namelen-bounds.chm: library not compiled to support large files.
chmd_search_test_01:80 FAILED chm1 = chmd->open(chmd, files[i])
test 3
Start 3: kwajd_test
3: Test command: C:\Users\micah\workspace\libmspack\libmspack\build\test\Debug\kwajd_test.exe
3: Test timeout computed to be: 10000000
3: ALL 102 TESTS PASSED.
3/3 Test #3: kwajd_test ....................... Passed 0.01 sec
33% tests passed, 2 tests failed out of 3
Total Test time (real) = 0.05 sec
The following tests FAILED:
1 - cabd_test (Failed)
2 - chmd_test (Failed)
Errors while running CTest
I've figured out why the Windows tests fail.
Two reasons:
You can't open "/dev/null"
in Windows. Instead, open "NUL"
(with 1 "L").
Defining _LARGEFILE_SOURCE=1
and _FILE_OFFSET_BITS=64
does not force off_t
to be an int64_t
.
This second issue manifests as failures for cabd where unsigned int
s are cast to off_t
and a large unsigned int becomes a negative value. The following patch resolves it easily enough:
diff --git a/libmspack/mspack/cabd.c b/libmspack/mspack/cabd.c
index 56549b8..75956da 100644
--- a/libmspack/mspack/cabd.c
+++ b/libmspack/mspack/cabd.c
@@ -1012,7 +1012,7 @@ static int cabd_extract(struct mscab_decompressor *base,
struct mscabd_folder_p *fol;
struct mspack_system *sys;
struct mspack_file *fh;
- off_t filelen;
+ unsigned int filelen;
if (!self) return MSPACK_ERR_ARGS;
if (!file) return self->error = MSPACK_ERR_ARGS;
@@ -1049,7 +1049,7 @@ static int cabd_extract(struct mscab_decompressor *base,
* In salvage mode, don't assume block sizes, just try decoding
*/
if (!self->salvage) {
- off_t maxlen = fol->base.num_blocks * CAB_BLOCKMAX;
+ unsigned int maxlen = fol->base.num_blocks * CAB_BLOCKMAX;
if ((file->offset + filelen) > maxlen) {
sys->message(NULL, "ERROR; file \"%s\" cannot be extracted, "
"cabinet set is incomplete", file->filename);
A larger problem is test failures with chmd caused by the same underlying issue, which I observed debugging a test where read_off64()
attempts to store a 64-bit value in a 32-bit off_t
:
For reference, this is in chmd_search_test_01
>chmd_open
>chmd_real_open
>chmd_read_headers
>read_off64
. On macOS, the same call sets *var
= 281474975015748
instead of -1694908
.
I can get test passes on Windows by converting all off_t
's to ptrdiff_t
's, which is certainly a more reliable type but no doubt breaks large-file support on 32-bit platforms where off_t
is forced to be an int64_t
.
A better solution I believe would be to swap out all off_t
usage for int64_t
, since mspack requires off_t
's to be 64-bits anyhow. The downside here is that int64_t
isn't defined on every system, and even the header where it would be found isn't the same on every system. The clamav project gets around this issue by generating and installing a types header. I'm not sure if this appeals to you... or if you have a better idea.
I'm going to push what I have now, and then take a break and wait for comments before I go any further.
Thanks very much for working on this! The support for building on Windows has been in need of modernisation for a while, and it's great that this works in harmony with the autotools rather than try to replace it. I'll take a look a the changes and review them.
- You can't open
"/dev/null"
in Windows. Instead, open"NUL"
(with 1 "L").
It seems reasonable to put in a Windows-specific filename here.
- Defining
_LARGEFILE_SOURCE=1
and_FILE_OFFSET_BITS=64
does not forceoff_t
to be anint64_t
.
Yes. This is an issue, and it's why you can see in commits 49d8a2addf384f54a1f225470405e4e5892ec8e5 and bb5e2b6649749c64a40c64eb334e01aa637a11f4 that I had to put the autotools SIZEOF_OFF_T test back in place. I thought I could get away from it, but I can't. Several platforms have 64-bit I/O without needing any preprocessor flags.
libmspack has two modes of operation:
The current tests in system.h are not ideal, only SIZEOF_OFF_T is really testing for 64-bit I/O, LARGEFILE*_SOURCE and _FILE_OFFSET_BITS are just meant for transition, FILESIZEBITS is not whether 64-bit I/O is supported but what is the maximum size of a file (this is really filesystem and path dependent, but in practical terms the limits.h value reveals if the system generally supports 64-bit I/O or not), and I could also have used SSIZE_MAX as another proxy (it's not whether fseek/fseeko supports 64-bit offsets, it's whether read/write support 64-bit sizes).
There is an argument for making the code explicitly use 64-bit datatypes, regardless of the size of off_t, but that has the tradeoff of ending support for 32-bit-only platforms (that have no 64-bit integers or in-compiler workarounds) and would require at least C99 to be absolutely certain of a 64-bit integer datatype.
From a build-system perspective, what should happen is:
libmspack has two modes of operation:
1. system.h defines LARGEFILE_SUPPORT **because** the underlying environment has 64-bit off_t and 64-bit file I/O (fseek/fseeko and ftell/ftello support 64-bit offsets). libmspack uses 64-bit offsets everywhere. 2. system.h does not define LARGEFILE_SUPPORT. It's assumed that only 32-bit file IO is supported. There is no expectation of any 64-bit datatypes. Explicit tests are added to catch any 32-bit or 64-bit values over 2GB, because in POSIX with non-64-bit file IO these offsets would not be supported.
So, to cover this better, I've removed the various largefile define checks that just guess if off_t is 64-bit or not, and now rely only on SIZEOF_OFF_T, so this should be simpler. If SIZEOF_OFF_T < 8, libmspack will assume off_t is at least 32-bit, and will apply overflow error checking if cab search offsets or CHM offsets go beyond a 2GB barrier. Build systems should apply any flags needed to enable 64-bit off_t (and fseeko/ftello) if they want 64-bit I/O support.
Sorry about the delay on this! I never saw a notification from github, and thought you hadn't looked at it yet. I'll try to review your comments asap and update accordingly.
After rebasing to include your changes and fix up a the mspack error message stuff, and add additional improvements to the autogen.sh script, I have found that 2/3 tests pass now. The chmd test still fails, because off_t is 32bits on Windows. Output below:
test 1
Start 1: cabd_test
1: Test command: C:\Users\micah\workspace\libmspack\libmspack\build\test\Debug\cabd_test.exe
1: C:/Users/micah/workspace/libmspack/libmspack/test/test_files/cabd/bad_nofiles.cab: no files in cabinet.
1: ERROR; file " ¶hello.jl♠�♣" cannot be extracted, cabinet set is incomplete
1: ERROR; file "�" cannot be extracted, cabinet set is incomplete
1: ERROR; file "���" cannot be extracted, cabinet set is incomplete
1: ERROR; file "2" cannot be extracted, cabinet set is incomplete
1: ALL 409 TESTS PASSED.
1/3 Test #1: cabd_test ........................ Passed 0.04 sec
test 2
Start 2: chmd_test
2: Test command: C:\Users\micah\workspace\libmspack\libmspack\build\test\Debug\chmd_test.exe
2: C:/Users/micah/workspace/libmspack/libmspack/test/test_files/chmd/cve-2015-4468-namelen-bounds.chm: library not compiled to support large files.
2: chmd_search_test_01:79 FAILED chm1 = chmd->open(chmd, files[i])
2/3 Test #2: chmd_test ........................***Failed 0.03 sec
test 3
Start 3: kwajd_test
3: Test command: C:\Users\micah\workspace\libmspack\libmspack\build\test\Debug\kwajd_test.exe
3: Test timeout computed to be: 10000000
3: ALL 102 TESTS PASSED.
3/3 Test #3: kwajd_test ....................... Passed 0.02 sec
Total Test time (real) = 0.11 sec
The following tests FAILED:
2 - chmd_test (Failed)
Errors while running CTest
After rebasing to include your changes and fix up a the mspack error message stuff, and add additional improvements to the autogen.sh script, I have found that 2/3 tests pass now. The chmd test still fails, because off_t is 32bits on Windows.
You should want 64-bit I/O on Windows, given it is capable of it. config.h needs to have something like this on Windows / Visual C:
#define off_t __int64
#define fseeko _fseeki64
#define ftello _ftelli64
#define SIZEOF_OFF_T 8
#define HAVE_FSEEKO
#define HAVE_FTELLO
However, this is just papering over a bug in the test suite which is visible when any system, not just Windows, runs with 32-bit I/O.
2: C:/Users/micah/workspace/libmspack/libmspack/test/test_files/chmd/cve-2015-4468-namelen-bounds.chm: library not compiled to support large files. 2: chmd_search_test_01:79 FAILED chm1 = chmd->open(chmd, files[i])
The purpose of chmd_search_test_01()
is to check that searching various corrupt files does not crash libmspack. It also asserts they can be opened, i.e. they're not so corrupt they're rejected early.
cve-2015-4468-namelen-bounds.chm
and cve-2015-4469-namelen-bounds.chm
are rejected by the extra checks in 32-bit I/O mode because they have any offsets over 2GB:
cve-2015-4468-namelen-bounds.chm
HS0 file length is 0xFFFFFFE62344cve-2015-4469-namelen-bounds.chm
offset to CS0 is 0x7000010CCcve-2015-4469-namelen-bounds.chm
HS0 file length is 0xDEFF00020000I've changed these offsets in the test files so both 32-bit I/O mode and 64-bit I/O mode get to the same code path and test the files properly. See commit 7e63519f2375c507cca8e41cdaa38f29d72ca84f
I've changed these offsets in the test files so both 32-bit I/O mode and 64-bit I/O mode get to the same code path and test the files properly. See commit 7e63519
I rebased the PR and fixed a minor issue w/ defining aliases for the static and shared libraries while I was at it. I'm now seeing a test passes on Windows!
I just tried playing around with your suggestion to #define off_t
to __int64
and to override fseeko
& ftello
with _fseeki64
and _ftelli64
. Unfortunately that results in many errors like this:
c:\program files (x86)\windows kits\10\include\10.0.18362.0\ucrt\sys\types.h(42): error C2628: '_off_t' followed by '
__int64' is illegal (did you forget a ';'?) [C:\Users\micasnyd\workspace\libmspack\libmspack\build\mspack\mspack_ObjLib
.vcxproj]
I'm not sure what to do about that. Any ideas?
Anyhow, With libmspack building and passing all unit tests (at least with 32bit I/O), I reckon it's time for me to implement the CMake tooling for cabextract. I'll hop on it.
Best, Micah
Okay! I have cabextract building on mac & Windows. The tests will run on posix systems, but not on Windows because they're all shell scripts.
That said, I haven't really tested out if cabextract.exe works properly on Windows. I did have to use a Win32 API since Windows doesn't have fnmatch.h
. It does built if you link it with a static mspack library (mspack_static.lib
) built with the libmspack CMake tooling first. Instructions added to the cabextract README. It won't build with the "bundled" mspack because the soft links don't work on Windows.
The tests all pass except the encoding
test. I haven't spent the time to figure out why it fails:
1/10 Test #1: bugs ............................. Passed 0.04 sec
test 2
Start 2: case-ascii
2: Test command: /bin/bash "/Users/micasnyd/workspace/libmspack-micah/cabextract/test/case-ascii.test"
2: Test timeout computed to be: 10000000
2/10 Test #2: case-ascii ....................... Passed 0.09 sec
test 3
Start 3: dirwalk-vulns
3: Test command: /bin/bash "/Users/micasnyd/workspace/libmspack-micah/cabextract/test/dirwalk-vulns.test"
3: Test timeout computed to be: 10000000
3/10 Test #3: dirwalk-vulns .................... Passed 0.06 sec
test 4
Start 4: encoding
4: Test command: /bin/bash "/Users/micasnyd/workspace/libmspack-micah/cabextract/test/encoding.test"
4: Test timeout computed to be: 10000000
4/10 Test #4: encoding .........................***Failed 0.03 sec
test 5
Start 5: large-files
5: Test command: /bin/bash "/Users/micasnyd/workspace/libmspack-micah/cabextract/test/large-files.test"
5: Test timeout computed to be: 10000000
5/10 Test #5: large-files ...................... Passed 27.21 sec
test 6
Start 6: mixed
6: Test command: /bin/bash "/Users/micasnyd/workspace/libmspack-micah/cabextract/test/mixed.test"
6: Test timeout computed to be: 10000000
6/10 Test #6: mixed ............................ Passed 0.07 sec
test 7
Start 7: search
7: Test command: /bin/bash "/Users/micasnyd/workspace/libmspack-micah/cabextract/test/search.test"
7: Test timeout computed to be: 10000000
7/10 Test #7: search ........................... Passed 0.04 sec
test 8
Start 8: simple
8: Test command: /bin/bash "/Users/micasnyd/workspace/libmspack-micah/cabextract/test/simple.test"
8: Test timeout computed to be: 10000000
8/10 Test #8: simple ........................... Passed 0.05 sec
test 9
Start 9: split
9: Test command: /bin/bash "/Users/micasnyd/workspace/libmspack-micah/cabextract/test/split.test"
9: Test timeout computed to be: 10000000
9/10 Test #9: split ............................ Passed 0.11 sec
test 10
Start 10: utf8-stresstest
10: Test command: /bin/bash "/Users/micasnyd/workspace/libmspack-micah/cabextract/test/utf8-stresstest.test"
10: Test timeout computed to be: 10000000
10/10 Test #10: utf8-stresstest .................. Passed 0.04 sec
90% tests passed, 1 tests failed out of 10
Total Test time (real) = 27.77 sec
The following tests FAILED:
4 - encoding (Failed)
Errors while running CTest
Please try it out and let me know what you think.
@kyz Have you had a chance to give this a try?
I've been meaning to figure out the reason for the encoding test failure. Will try to figure that out soon.
@kyz Ok I fixed the encoding test. It was an iconv detection issue. Speaking of which, I noticed that iconv detection will fail for autotools if gettext-devel isn't installed, but I didn't know if I should try to fix it (or how).
I also rebased this with master and since then the Windows tests for libmspack are passing. 🎉
Anyways, I also just added github actions build tests for Mac, Linux, and Windows for cabextract and libmspack. For them to run on this PR, I think you'll have to approve them. I tested them on my fork and they seem to run nicely: https://github.com/micahsnyder/libmspack/actions/workflows/cmake-cabextract.yml
I was also able to use vcpkg to bring in libiconv support with cabextract on Windows! You can see libiconv detection working on Windows here: https://github.com/micahsnyder/libmspack/runs/2484761769?check_suite_focus=true
Anyways, I think this work is basically done? Personally I'd advocate for removing the autotools build system at this point. If you like that, I can help.
Please give this a try and let me know what you think.
I strongly advise against removing the autotools stuff. I had to create or restore autotools projects to be able to properly cross compile a few apps.
With autotools I can easily cross compile cabextract for 20 linux processor architectures and win32/win64, it needs a couple of tweaks for libiconv and only include the local getopt when the system getopt is missing, for better compatibility with MinGW.
Setting CMake mininum version to 3.20 and remove all the extra cmake files is a good idea.
CMake for modern systems, autotools for older systems and for cross compiling like there's no tomorrow.
Cross-compiling should be possible with CMake as well, though I don't have any experience with it: https://cmake.org/cmake/help/latest/manual/cmake-toolchains.7.html
Setting CMake mininum version to 3.20 and remove all the extra cmake files is a good idea.
Can you elaborate?
I think it got easier in recent versions, but for many years only the old autotools could handle cross-compiling properly. My scripts use autotools, this is crucial for mingw at least, Windows binaries magically appear.
And autotools works with barebones toolchains, with basically gcc and nothing else, simple C apps like this one compile in a lot of architectures without issues. If anybody is interested I can provide static binaries for releases.
I mean all those cmake modules or something that got added to cmake in recent years, I see a lot of them in this PR. But it's just an idea.
Anyway if autotools gets removed I'll restore it for my cross-builds. CMake is good for testing stuff in native installations in Travis CI and the like.
Anyway I think this PR is good and it was a lot of work, I hope it gets merged.
Porting autotools configs to cmake is a bit problematic, it requires knowledge of both build systems.
Testing unshield in several OS's until Travis CI cancelled my ability to trigger builds, I have some ideas. A better md5.c that is endian-independent, a header-only getopt.h and fnmatch.h only used if system headers are not found, but this should be merged first.
autotools: something like this, configure.ac
should check for getopt.h
cabextract.c should have something like this:
#ifdef HAVE_GETOPT_H
# include <getopt.h>
#else
# include "../compat/getopt.h"
#endif
MSVC and ancient Solaris versions lack getopt.h, only MSVC is worth considering.
Now with CMake:
CMakeLists.txt: check_include_files(getopt.h HAVE_GETOPT_H)
cmake.config.in: #cmakedefine HAVE_GETOPT_H @HAVE_GETOPT_H@
and so on
But here's the thing, one may have many ideas and the ability to perform big changes, but that's not enough for stuff to be merged. Once the momentum is lost, everything is back to normal and the code lays dormant. I'm seeing the same history in basically all repos.
Update: This PR is complete.
Best, Micah
Adds CMake tooling for libmspack to the project. This attempts to replicate every feature available in the autotools tooling and includes support for building on Windows.
Adds a mspack-version.h header file, generated from mspack-version.h.in. mspack-version.h is to be installed alongside mspack.h so that application developers can use the version numbers and strings in the event that the API is modified and the application must maintain backwards compatibility.
Adds error handling to autogen.sh.
Reformats libmspack's README as Markdown.
Adds mspack_error_msg() and MSPACK_ERROR() macro to mspack.h API. The new API is a cross between the cab_error() function previously found in cabextract and error_msg() found in test/error.h. This removes the need the examples to depend on test/error.h, a header not exported by any build target, and removes the code duplication with cabextract.