tukaani-project / xz

XZ Utils
https://tukaani.org/xz/
Other
532 stars 95 forks source link

Make xz buildable with MSVC and add 64-bit filesize support #60

Closed kiyolee closed 11 months ago

kiyolee commented 1 year ago

To support files larger than 4 GiB on Windows, all calls of stat() and lseek() have to be redirected to 64-bit filesize capable calls.

Pull request checklist

Please check if your PR fulfills the following requirements:

Note: Compile warnings are inevitable for MSVC when some variables can be either 64-bit (target x64) or 32-bit (target Win32) at compile time. Those warnings need extensive changes to clean up.

Pull request type

Please check the type of change your PR introduces: - [ ] Bugfix - [ ] Feature - [ ] Code style update (formatting, renaming, typo fix) - [ ] Refactoring (no functional changes, no api changes) - [ ] Build related changes - [ ] Documentation content changes - [X] Other (please describe): Make xz buildable with MSVC. Add 64-bit filesize support on Windows. ## What is the current behavior?

xz does not build with MSVC. xz would refuse to handle files larger than 4 GiB.

Related Issue URL:

What is the new behavior?

xz can be built with MSVC. xz can handle files larger than 4 GiB on Windows.

Does this introduce a breaking change?

Other information

I have added extensive MSVC build support in my own repo https://github.com/kiyolee/xz-win-build. In addition to building xz, I have added support to build all tests as well. This PR only covers code changes that were done while setting up my own repo.

JiaT75 commented 12 months ago

Hello!

Thank your for the PR. I have been wanting to add MSVC support to xz but have not had the time yet. Unfortunately this cannot be accepted in its current state because many things are preventing this from building with MSVC.

First, there are no build system changes. We are moving away from supporting the Visual Studio Solution Files starting with the upcoming 5.6.0 release planned for the end of this year. Instead we would like our Windows users to use CMake instead to generate the Visual Studio files.

Second, there are functions that I don't think Visual Studio default C libraries support. Specifically the functions in mytime.c would still need to be ported for this to compile.

I did not try to build this yet since the build system changes were not made. I did not review closely yet the changes that were made to alias functions and structs in file_io.* so I cannot comment one way or another on those.

I don't mean to discourage your efforts on this. I do want xz to build with MSVC eventually but it should be done small stages. The first stage I would start with is by adding CMake support for getopt_long() replacement. The Autotools build has support for this and the replacement files are in /lib. The next stage could be porting the file_io functions, perhaps what you have already works for that. Then maybe the mytime.c functions. Eventually, we can remove the "NOT MSVC" check for CMake building the xz target in CMakeLists.txt when we are confident things are working well.

kiyolee commented 11 months ago

I happen to have a fix for mytime.c too in my own repo. Already forgot about that. see: https://github.com/kiyolee/xz-win-build/blob/main/src/xz/mytime.c Would you want a PR for that? I totally understand the preference of cmake. My way of building things with VS is rather personal and I wouldn't submit that at all as PR. The purpose of my PR is only for when you get to build things on Windows, the code is readily buildable, through whatever build system that you prefer.

kiyolee commented 11 months ago

Added all the changes that I have made to build xz/xzdec with MSVC. Mostly for your reference. All code are in public domain (following the original license), please feel free to take anything if they fit.

kiyolee commented 11 months ago

Added a commit to fix build break in suffix.c when NOT using MSVC.

Larhzu commented 11 months ago

Thanks for the updates!

Based on this PR I created a branch xz_for_msvc. I put some of the commits in your name even though they were modified a little. If you prefer otherwise please let me know.

Unless I missed something, xz_for_msvc should have everything from this PR except __declspec(noreturn) and the VS2013 fallback for snprintf.

I feel the noreturn could be handled in a more generic way. noreturn from <stdnoreturn.h> could be ideal. According to docs, it seems to be supported by VS2015 too. It's currently not used in XZ Utils, only the GNU C __attribute__((__noreturn__)) is, but this could be changed.

Unless there is a good reason, I feel VS2013 support shouldn't be added to the command line tools to keep the MSVC patches as simple as easily possible. By the time this code is in a stable XZ Utils release, VS2013 will only have 5-7 months of support remaining (if April 2024 is the true end date for VS2013 support).

The changes to file_io.* I made quite differently and it's quite possible that my approach cannot work. It would be great if you could test it and tell if it works or can be made to work. Otherwise I will adapt your version from this PR.

DOS/DJGPP build checks for special filenames like prn in file_io.c which could happen in weird cases like xz -S_xz -d prn_xz. I wonder if something like that should be done on Windows builds too (not just MSVC).

kiyolee commented 11 months ago

All sounds good and reasonable. You branch does build and work. I guess I can close this PR. P.S. I think you are right that Windows build would need to handle those legacy DOS special filenames like con, prn, com1, etc.

Larhzu commented 11 months ago

Thanks for testing! There are more commits in xz_for_msvc now, including CMake support. It would be awesome if you could test it with MSVC again. If you don't have time etc. then feel free to say so or ignore this. :-)

I think xzdec should build now with VS2013. xz is set to require VS2015 (_MSC_VER >= 1900, that is, MSVC_VERSION >= 1900 in CMake). I don't know if a more recent VS version should be recommended in the docs, like, if there are compatibility fixes that matter.

_Noreturn needs /std:c11 or /std:c17. CMake likely doesn't set it because CMakeLists.txt only requires a C99 compiler. There is __declspec(noreturn) too for this case.

I wonder if C11/C17 mode would be preferred for other reasons, for example, if standards conformance would be stricter and thus risk of weird bugs would be lower. Microsoft docs say that C11/C17 needs an updated Windows SDK and UCRT though. I don't have much clue about these. Would using C11/C17 mode affect how old Windows versions can run the resulting binaries?

About con and friends. At least with MinGW-w64 builds it seems to be a problem (possibly a security issue). xz -d -S_xz con_xz decompresses to console even though open is used with O_EXCL. I'm not sure how to fix. I would expect Windows to have an API to check for problematic filenames instead of apps needing to roll their own checking code. The code used with DJGPP isn't compatible with anything else.

kiyolee commented 11 months ago

Except a small bug in "getopt.in.h" for MSVC (https://github.com/tukaani-project/xz/pull/63), cmake build for MSVC works for all VS2015, VS2017, VS2019 and VS2022.

kiyolee commented 11 months ago

As per C11/C17 requirement, as you are already requiring VS2015 or later, that pretty much likes requiring Windows 10 or later. However, compiling using a specific Windows SDK version does not necessarily limits the Windows version that the output binaries can run on. That's more depending on what Windows APIs you have used. If say no Windows 10 or later only APIs is used, the binaries could likely work on Windows 8 or before. That's what _WIN32_WINNT can help. Define _WIN32_WINNT to the minimum Windows version you want to support, whatever version Windows SDK you use should expose only APIs available for that targeted Windows version. But I think requiring a specific Windows SDK could be annoying for users only having older VS. There could be reasons that newer version of Windows SDK cannot be installed. If I am correct, the current cmake build would just use the latest Windows SDK available and that is handy. Especially if you limit _WIN32_WINNT to say Windows 8, that means any Windows 10 SDK should work. I have not tried though, cmake might have intelligence to sort things out when you require C11/C17. cmake could just complain if the required SDK does not exist.

kiyolee commented 11 months ago

https://learn.microsoft.com/en-us/windows/win32/api/fileapi/nf-fileapi-getfiletype A quick search and I find this Windows API that may help to detect special named files on Windows. Note: MinGW build can target either msys2 or native-windows. For msys2, special filenames may be less of a problem (inherited cygwin capability). For native-windows, special filenames are indeed problematic.

Larhzu commented 11 months ago

Thanks again for testing!

I included the unistd.h fix from PR 63 in the xz_for_msvc branch.

With CMake 3.27 and its new default policy CMP0149 the xz_for_msvc branch uses the latest Windows SDK by default.

CMakeLists.txt currently requires a C99 compiler:

set(CMAKE_C_STANDARD 99)
set(CMAKE_C_STANDARD_REQUIRED ON)

If the first line was set to C11 and the second line was omitted then CMake would attempt to find a C11 compiler but would accept older standard too if C11 isn't available. So that would be a way to get C11 mode when using new enough MSVC. But maybe it's not nice if it limits SDK choices.

Since it works now, maybe it's fine to leave it as it is.

About he commit to tuklib_physmem.c that avoids building the pre-W2k code: I suspect that this

#if defined(_WIN32_WINNT_WIN2K) && _WIN32_WINNT >= _WIN32_WINNT_WIN2K

isn't correct. Now the old code will never be built.

_WIN32_WINNT is about exposing newer features from the API headers, it doesn't mean that the program will automatically require that version of Windows. Earlier the builds used #define _WIN32_WINNT 0x0500 (which is _WIN32_WINNT_WIN2K) to make MEMORYSTATUSEX visible in the API headers. Those binaries could still run even on Win95 if msvcrt.dll was available because GlobalMemoryStatusEx was loaded dynamically.

Maybe at this point it could be best to just omit pre-W2K support from that file. Even when it was written, it was just a fun distraction to check if Windows build of xz could easily run even on Win95 and it did.

The win95 threading option, despite its name, exist for WinXP support. Those APIs just happen to be in Win95 already. The threading APIs from WinVista are closer to pthreads than the older APIs but, as far as I know, there shouldn't be any significant difference in practice in case of liblzma since it needs only a small subset of features. Requiring WinVista would simplify things though but on the other hand the support for the ancient things already exists and works fine.

GetTickCount64 in mytime.c needs WinVista so MSVC builds of the xz command line tool will need at least WinVista.

GetFileType needs a HANDLE so one would first need CreateFile and so on. It's unfortunate if _stat64 doesn't return any info in st_mode or st_dev or other member. I think I won't work on this problem now. If I have understood correctly, it helps slightly that the problem can only occur if using --suffix as the default suffixes have a dot and thus if the input file is valid then the output is too since both con and con.xz are invalid names for regular files.

kiyolee commented 11 months ago

https://learn.microsoft.com/en-us/cpp/c-runtime-library/reference/get-osfhandle?view=msvc-170 In case that can be helpful, you can get the HANDLE behind a file descriptor with function _get_osfhandle().

Larhzu commented 11 months ago

Thanks! Now I realized that I had misidentified the problem. S_ISREG is enough but it has to be used with _fstat64. With _stat64, con is a regular file. So the method used for DJGPP is at the wrong location for Windows.

I pushed a commit to xz_for_msvc which should fix it. I tested it with MinGW-w64.

There is another special case in the DJGPP-specific code but I think it's not needed on Windows. It's possible that the output filename is the same as the input filename. On DOS with only 8.3 names it can happen if an overlong name is given on the command line. But it can happen on modern Windows too if 8.3 names are enabled. For example:

echo foo | xz > foobar~1zoo
xz --suffix=zoo --decompress --force foobar~1zoo

It should fail because it cannot remove foobar~1 because the file is already open. It's the same file as foobar~1zoo due to 8.3 names.

Larhzu commented 11 months ago

I merged xz_for_msvc to master except the tuklib_physmem W2k commits.

Larhzu commented 11 months ago

xz_w2k includes simple commits that remove pre-W2k support. Very likely it will be merged but I didn't want to put it to master directly.

The getopt.in.h question will be reconsidered. I don't know yet if it will be changed.

I think everything from this PR and discussion has been handled now. Thanks a lot!