readium / readium-sdk

A C++ ePub renderer SDK
BSD 3-Clause "New" or "Revised" License
387 stars 164 forks source link

epub3 with ".." in the toc item url crashes #67

Open openmuthu opened 10 years ago

openmuthu commented 10 years ago

try to open the following epub in launcher-windows https://code.google.com/p/epub-samples/downloads/detail?name=page-blanche-20130322.epub&can=2&q=

it causes crash. The issue is in readium-sdk/ePub3/utilities/path_help.cpp in CleanupPath() function. Here is a patch which fixes the issue. Please review and submit the change.

==== readium-sdk/ePub3/utilities/path_help.cpp
35c35
<   for (auto pos = begin; pos < end; ++pos)

---
>   for (auto pos = begin; pos < end;)
52c52
<           components.erase(parent, dotDot);

---
>           pos = components.erase(parent, ++dotDot);
54a55,56
>       else
>           ++pos;
danielweck commented 8 years ago

Let's revive this issue + patch proposal.

(1) First of all, the fix is in https://github.com/readium/readium-sdk/blob/develop/ePub3/utilities/path_help.cpp#L27

Current code:

string CleanupPath(const string& path)
{
    static REGEX_NS::regex _PathSplitter("/");
    std::vector<string> components = path.split(_PathSplitter);

    auto begin = components.begin();
    auto end = components.end();
    for (auto pos = begin; pos < end; ++pos)
    {
        if (*pos == ".." && pos != begin)
        {
            decltype(pos) dotDot, parent;

            dotDot = pos--;
            if (pos == begin)
            {
                parent = begin;
                pos = dotDot;
            }
            else
            {
                parent = pos--;
            }

            components.erase(parent, dotDot);
            end = components.end();
        }
    }

    std::ostringstream ss;
    for (auto& str : components)
    {
        ss << str;
        ss << '/';
    }

    string result = ss.str();
    if (path[path.size() - 1] != '/')
        result.erase(result.size() - 1);

    return result;
}

(2) Secondly, the CleanupPath() function is used only by NavigationPoint::AbsolutePath() in (as far as I can tell): https://github.com/readium/readium-sdk/blob/develop/ePub3/ePub/nav_point.cpp#L87

(3) Thirdly, is the proposed fix really specific to Windows? This seems like a generic path processing bug that should trigger on any platform, given the input EPUB3 "page blanche"

(4) Fourthly, I am concerned that multiple "path normalization" -related functions are inconsistently used, for example in PackageBase::ManifestItemAtRelativePath() at https://github.com/readium/readium-sdk/blob/develop/ePub3/ePub/package.cpp#L173 notably the usage of url_util::DecodeURLEscapeSequences(), or in the SMIL / Media Overlays path resolver (getReferencedManifestItem()) at https://github.com/readium/readium-sdk/blob/develop/ePub3/ePub/media-overlays_smil_model.cpp#L653 which performs its own canonical-ization logic! (e.g. removal of "parent_folder/.." path sequences)

(5) Fifthly, the download link for the EPUB "page blanche" sample is now: https://drive.google.com/uc?export=download&id=0B9g8D2Y-6aPLMmpCYktzbkJlQ1E (source: https://github.com/IDPF/epub3-samples/tree/master/30/page-blanche ) Note that this is also a ReadiumJS sample ebook: https://github.com/readium/readium-js-viewer/tree/develop/epub_content/page-blanche , see live demo https://readium.firebaseapp.com/?epub=epub_content%2Fpage-blanche&