rikyoz / bit7z

A C++ static library offering a clean and simple interface to the 7-zip shared libraries.
https://rikyoz.github.io/bit7z
Mozilla Public License 2.0
634 stars 116 forks source link

[Question] BIT7Z_USE_STD_BYTE, BIT7Z_USE_NATIVE_STRING impact? #149

Closed levicki closed 1 year ago

levicki commented 1 year ago

@rikyoz

This didn't exist when I first started using bit7z, can you clarify whether there is any performance impact of enabling it?

Likewise, what is the impact of native string thing?

Finally, BIT7Z_VS_LIBNAME_OUTDIR_STYLE is also less than clear what it actually does when enabled, and what is the resulting hieracrhy and naming if it is left unchecked.

Please clarify.

rikyoz commented 1 year ago

This didn't exist when I first started using bit7z, can you clarify whether there is any performance impact of enabling it?

Sure! Sorry for the missing documentation; I plan to add more info in the wiki after the release of the stable v4.

Enabling BIT7Z_USE_STD_BYTE makes bit7z use C++17's std::byte as its "byte" type for the buffers. If the compiler doesn't support std::byte, bit7z will provide a simple implementation (essentially, it's just a strong type alias for unsigned char). By default, it is disabled, and byt7z's byte type is just a non-strong type alias for unsigned char. I don't think it impacts the code's performance; it's just a flag available for anyone using std::byte for their buffers instead of plain unsigned char. If you're not extracting or compressing to/from buffers, you can leave it disabled.

Likewise, what is the impact of native string thing?

I introduced BIT7Z_USE_NATIVE_STRING when I added the support to Linux and macOS. To make the cross-platform usage of bit7z easier, I decided that, by default, bit7z will use the std::string type for the paths it takes as input, and it will consider the strings to be encoded using UTF-8 (essentially, following the UTF-8 Everywhere manifesto.

Now, 7-zip by default uses wide strings in its API (and hence why bit7z used only std::wstrings before the v4), so bit7z needs to convert from UTF-8 std::strings to UTF-16 wide strings before passing them to 7-zip. This conversion is needed on Linux and macOS since the "de-facto" standard string type is already UTF-8 std::string. On Windows, on the other hand, the "native" strings are wide, just like the ones used by 7-zip, and a user might already be using wide strings in their program.

Hence, by enabling the BIT7Z_USE_NATIVE_STRING flag, you make the bit7z's API use wide strings for paths like in the v3. In theory, this might positively impact performance (no conversion between UTF-8 and UTF-16 strings happens), but I did some tests some time ago and didn't notice any significant improvement; I wasn't specifically testing the conversion though, so your mileage may vary. Nevertheless, it might be helpful for the user, at least on the API level. Please note that the BIT7Z_USE_NATIVE_STRING changes the bit7z API only on Windows; on Linux and macOS, the paths will always be std::strings, since it's unusual to use std::wstrings on these platforms.

Finally, BIT7Z_VS_LIBNAME_OUTDIR_STYLE is also less than clear what it actually does when enabled, and what is the resulting hieracrhy and naming if it is left unchecked.

The BIT7Z_VS_LIBNAME_OUTDIR_STYLE flag forces CMake to use the Visual Studio output library name and directory structure convention. In practice, when enabled:

By default, this flag is disabled. Unfortunately, I could not find any better naming for this option, but I'm open to suggestions!

levicki commented 1 year ago

@rikyoz

Sure! Sorry for the missing documentation; I plan to add more info in the wiki after the release of the stable v4.

Hey no problem man, I see that 7-Zip 23.01 LZMA SDK has made some code-breaking changes. After checking it out I must say I don't envy you at all.

I read up on std::byte and so far it seems to be just syntactic sugar, but there were some claims of some compilers not producing optimal code with it, that's why I asked.

As for library name style, perhaps it would be best to have it fully configurable? As in allow user to specify exact name of library with default being bit7z?

I must admit that I am not a great fan of having different names for 64-bit and 32-bit (and/or debug) library names. I think it just unnecessarily complicates project setup for those who will use the library, at least on Windows platform.

Rationale for that is as follows:

  1. It's more work to set the name of a library you are linking to for every possible combination of x86, x64, Debug, and Release than it is to set the name once to bit7z.lib, and then set the same wildcard libraray include path for all of them -- lib\$(PlatformTarget)\$(Configuration)\ to let your linker worry about picking up the correct file.

  2. I believe in orthogonality so if there's bit7z64, then there either should be bit7z32 or no suffix at all for either of them. Also, as the numbers 64 and 32 can be easily confused for version numbers using _x64 and _x86 would IMO be clearer if you insist on naming them differently.

In any case, I think it would be best if the user could control the output library name and output path. I am not sure if that's possible with CMake though.

Finally, I tried using cmake-gui on bit7z 4.0-rc and it seems to me that I can't produce both 32-bit and 64-bit libraries in a single pass (i.e. a single project file that has both configurations). If true, for my use case that would be a regression from 3.11.

rikyoz commented 1 year ago

Hey no problem man, I see that 7-Zip 23.01 LZMA SDK has made some code-breaking changes. After checking it out I must say I don't envy you at all.

Yeah, some days ago, I did a small "experiment" with 7-zip 23.01, and, unfortunately, it broke everything. The fix is not straightforward, so it will not be easy to add support for this version; I hope to be able to add it for the v4 stable, but I'm also evaluating whether to leave it for the next v4.1.

I read up on std::byte and so far it seems to be just syntactic sugar, but there were some claims of some compilers not producing optimal code with it, that's why I asked.

I guess that some compilers still need to improve their optimizations with std::byte. I provided this option just in case anyone needed it (it wasn't a difficult change).

I must admit that I am not a great fan of having different names for 64-bit and 32-bit (and/or debug) library names. I think it just unnecessarily complicates project setup for those who will use the library, at least on Windows platform.

I see. Unfortunately, there are many conventions for C++ library file naming, and yes, the current naming and directory structure scheme might complicate things in this case. I'll evaluate how to improve this!

It's more work to set the name of a library you are linking to for every possible combination of x86, x64, Debug, and Release than it is to set the name once to bit7z.lib, and then set the same wildcard libraray include path for all of them -- lib\$(PlatformTarget)\$(Configuration)\ to let your linker worry about picking up the correct file.

When you use the library via CMake, you have to know only the name of the library:

add_subdirectory( ${CMAKE_SOURCE_DIR}/third_party/bit7z ) # path containing the source code of bit7z
target_link_libraries( ${TARGET_NAME} PRIVATE bit7z64 ) # or bit7z on x86

But yeah, it's not as easy with other build systems. I'm evaluating whether to drop the 64 suffix since the architecture is specified in the output folder anyway (even without the BIT7Z_VS_LIBNAME_OUTDIR_STYLE flag). In this case, enabling the BIT7Z_VS_LIBNAME_OUTDIR_STYLE option would result in precisely the naming scheme you suggested!

I believe in orthogonality so if there's bit7z64, then there either should be bit7z32 or no suffix at all for either of them.

This is actually what bit7z does: on x86, no suffix is appended to the library's name.

Also, as the numbers 64 and 32 can be easily confused for version numbers using _x64 and _x86 would IMO be clearer if you insist on naming them differently.

Uhm might be helpful, but the 64/32 are pretty standard, at least on some platforms like Linux.

In any case, I think it would be best if the user could control the output library name and output path. I am not sure if that's possible with CMake though.

Yeah, giving the user full control would be the best thing.

Finally, I tried using cmake-gui on bit7z 4.0-rc and it seems to me that I can't produce both 32-bit and 64-bit libraries in a single pass (i.e. a single project file that has both configurations). If true, for my use case that would be a regression from 3.11.

Unfortunately, I fear that this is a limitation of CMake itself, at least from what I read online (https://gitlab.kitware.com/cmake/cmake/-/issues/18450, https://stackoverflow.com/questions/46300384/cmake-and-multiple-platforms-visual-studio-solution, https://discourse.cmake.org/t/how-to-make-a-visual-studio-solution-with-x64-x86-support/880/8). The first time you configure the project via cmake-gui, you can select only one platform; the same happens when you use the CLI. I might restore the vcxproj file, but then I would be back to where I was before, where, for example, I had to remember to keep updated two project files (the CMakeLists.txt and the vcxproj) every time I had to add or remove a source file. Nevertheless, I'll evaluate this too!

levicki commented 1 year ago

@rikyoz

Yeah, some days ago, I did a small "experiment" with 7-zip 23.01, and, unfortunately, it broke everything.

How do you think I know? (Hint: I did the same thing yesterday afternoon -- I was not amused 🤣)

I mean, he did increase the major version so at least he is following the semantic versioning rules, but still the resulting breakage is pretty staggering.

I hope to be able to add it for the v4 stable...

Here's to hoping!

When you use the library via CMake, you have to know only the name of the library:

My use case is bit7z.lib statically linked into my SevenZipExtractor.dll which is a C++/CLR Windows only class library project and whose build output is used in a C# Windows Forms application. In such a scenario, setting up CMake config in my project just to get to the library name easier doesn't make a whole lot of sense.

But yeah, it's not as easy with other build systems.

Exactly, and the worst part of all that is the further proliferation of build and packaging systems.

I'm evaluating whether to drop the 64 suffix... would result in precisely the naming scheme you suggested!

That would be really nice.

Unfortunately, I fear that this is a limitation of CMake itself...

Indeed, and they say it will never be supported.

I might restore the vcxproj file...

Please don't bother with that, I asked more out of curiosity. If push comes to shove I can always add the platform targets to a CMake generated vcxproj.

Maintaining both build systems would be just adding complexity for almost no benefit. Support for 32-bit executables and libraries will probably be gone in the next couple of years.

rikyoz commented 1 year ago

How do you think I know? (Hint: I did the same thing yesterday afternoon -- I was not amused 🤣)

🤣

I mean, he did increase the major version so at least he is following the semantic versioning rules, but still the resulting breakage is pretty staggering.

Yeah, and to be honest I didn't notice any actual code quality improvement that would justify such a massive code breakage...

My use case is bit7z.lib statically linked into my SevenZipExtractor.dll which is a C++/CLR Windows only class library project and whose build output is used in a C# Windows Forms application. In such a scenario, setting up CMake config in my project just to get to the library name easier doesn't make a whole lot of sense.

I see. I'll definitely work on improving the handling of such use cases!

Exactly, and the worst part of all that is the further proliferation of build and packaging systems.

Yeah, it's a mess, unfortunately!

Please don't bother with that, I asked more out of curiosity. If push comes to shove I can always add the platform targets to a CMake generated vcxproj.

Maintaining both build systems would be just adding complexity for almost no benefit. Support for 32-bit executables and libraries will probably be gone in the next couple of years.

👍

levicki commented 1 year ago

@rikyoz Thanks for the clarifications, can't wait for 4.0 stable.