rikyoz / bit7z

A C++ static library offering a clean and simple interface to the 7-zip shared libraries.
https://rikyoz.github.io/bit7z
Mozilla Public License 2.0
628 stars 114 forks source link

[Feature Request]: case insensitive path comparison option for Windows #228

Open dhananjay-gune opened 1 month ago

dhananjay-gune commented 1 month ago

Feature description

BitInputArchive contains() and find() should have an ignoreCase option.
currently the contains() and find() of the BitInputArchive class do case-sensitive comparison.
e.g. if to_search is ABC\xyz.txt and archive contains Abc\Xyz.txt, it won't find it.

BitFileCompressor.compressFiles() too, fails if the in_dir differs in case with the actual path on the disk. 
e.g. in_dir argument is C:\Temp\MYFOLDER whereas the actual path on the disk is C:\TEMP\MyFolder, the archive creation fails with 'path not found' :-o

On Windows this poses a problem where the clients can specify the file name (path param) in any case (lower, upper, camel, whatever) and the underlying api must work.

There might be other places where the path comparison is done case sensitively.

Requesting you to please find a way so that users don't have to worry about the case-sensitivity during any operation/comparison related to paths / file names e.g. a platform specific #define or something.

Additional context

No response

Code of Conduct

dhananjay-gune commented 1 month ago

Currently I am implementing a workaround like:

BitArchiveReader reader = BitArchiveReader{ lib, archivePath, archiveFormat };
for each (auto entry in reader)
{
    auto entryInArchive = entry.path();
    bool isEqual = lstrcmpi(entryInArchive.c_str(), entryToSearch.c_str()) == 0;
    if (isEqual)
    {
        return true;
    }
}
return false;

I have used lstrmpi for comparing strings.

rikyoz commented 1 month ago

Hi!

BitInputArchive contains() and find() should have an ignoreCase option. currently the contains() and find() of the BitInputArchive class do case-sensitive comparison. e.g. if to_search is ABC\xyz.txt and archive contains Abc\Xyz.txt, it won't find it.

I think it is a useful feature, I'll definitely add it, possibly in the next v4.1.

My only doubt is whether to make it build-time option (e.g., like BITZ7_AUTO_FORMAT), or a runtime argument for the contains() and find() functions. I'll need to evaluate which is the best.

BitFileCompressor.compressFiles() too, fails if the in_dir differs in case with the actual path on the disk. e.g. in_dir argument is C:\Temp\MYFOLDER whereas the actual path on the disk is C:\TEMP\MyFolder, the archive creation fails with 'path not found' :-o

On Windows this poses a problem where the clients can specify the file name (path param) in any case (lower, upper, camel, whatever) and the underlying api must work.

There might be other places where the path comparison is done case sensitively.

This is probably a bit trickier than contains() and find(), since this behavior is due to the implementation of std::filesystem::path (https://stackoverflow.com/questions/61351236/lexical-compare-stdfilesystempath-case-insensitive), which bit7z uses internally for paths.

But I'll try to find a workaround.

Anyway, thank you for the feature request!

dhananjay-gune commented 1 month ago

I also discovered that BitFileExtractor.extractMatching() gives an error if the case doesn't match.
Is there a way to extract a given directory entry from an archive in a case insensitive way?

rikyoz commented 1 month ago

I also discovered that BitFileExtractor.extractMatching() gives an error if the case doesn't match.

Yeah, this is the expected behavior, as the wildcard matching is performed treating paths as strings rather than filesystem paths (and by default, string/char comparisons are case-sensitive). But it should be possible to allow case-insensitive matching, of course.

Actually, there are also some private functions called extractMatchingFilter that are used to implement all the "matching" extraction functions. They take any generic "filtering" function (std::function< bool( const tstring& ) >), where the return value is true if the item must be extracted, false otherwise. I'm starting to think that I should make these functions public, as they would help in cases like yours, or in general when the matching is not performed via a case-sensitive wildcard or regex pattern.

This latter change might be available already in the next v4.0.8, the other changes might require more time.

Is there a way to extract a given directory entry from an archive in a case insensitive way?

For the time being, you should be able to extract it by index, e.g.:

BitArchiveReader reader = BitArchiveReader{ lib, archivePath, archiveFormat };
for (const auto& entry : reader)
{
    auto entryInArchive = entry.path();
    bool isEqual = lstrcmpi(entryInArchive.c_str(), entryToSearch.c_str()) == 0;
    if (isEqual)
    {
        // extractTo takes an array of indices of the items to be extracted
        reader.extractTo( "<outPath>", { entry.index() } );
        break;
    }
}
dhananjay-gune commented 1 month ago

Thanks! I'll try that.

dhananjay-gune commented 1 month ago

Thanks! I'll try that.

It worked! Thanks. I have used PathMatchSpec() win32 api to do the wildcarded matching:

// subTreeRoot is a directory inside the archive
tstring wildcardedSubTreeRoot = subTreeRoot + BIT7Z_STRING("\\*");
for (const auto& entry : extractor)
{
    auto entryInArchive = entry.path();
    BOOL matches = PathMatchSpec(entryInArchive.c_str(), subTreeRoot.c_str());
    if (!matches)
    {
        matches = PathMatchSpec(entryInArchive.c_str(), wildcardedSubTreeRoot.c_str());
    }
    if (matches)
    {
        indices.push_back(entry.index());
    }
}