openzim / libzim

Reference implementation of the ZIM specification
https://download.openzim.org/release/libzim/
GNU General Public License v2.0
163 stars 47 forks source link

Correctly handle failing mmap on FileReader. #867

Closed mgautierfr closed 3 months ago

mgautierfr commented 4 months ago

If we try to mmap a region to high (>4GB), makeMmappedBuffer will throw a MMapException. In this case, we want to do a read in the file instead of mmap.

This case is handled in MultiPartFileReader but was missing in FileReader.

Fix #866

mgautierfr commented 4 months ago

@BPerlakiH can you test this on your side ?

codecov[bot] commented 4 months ago

Codecov Report

Attention: Patch coverage is 76.92308% with 6 lines in your changes are missing coverage. Please review.

Project coverage is 58.07%. Comparing base (48da5b8) to head (916afdf).

Files Patch % Lines
src/file_reader.cpp 71.42% 1 Missing and 5 partials :warning:
Additional details and impacted files ```diff @@ Coverage Diff @@ ## main #867 +/- ## ========================================== + Coverage 58.00% 58.07% +0.07% ========================================== Files 101 101 Lines 4622 4623 +1 Branches 1923 1922 -1 ========================================== + Hits 2681 2685 +4 + Misses 667 665 -2 + Partials 1274 1273 -1 ```

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

BPerlakiH commented 4 months ago

@mgautierfr I have tested with this branch by replacing it in kiwix-build:

diff --git a/kiwixbuild/dependencies/libzim.py b/kiwixbuild/dependencies/libzim.py
index a57e69e..88dfbd6 100644
--- a/kiwixbuild/dependencies/libzim.py
+++ b/kiwixbuild/dependencies/libzim.py
@@ -9,7 +9,7 @@ class Libzim(Dependency):
     class Source(GitClone):
         git_remote = "https://github.com/openzim/libzim.git"
         git_dir = "libzim"
-        git_ref = "trace_mmap_macos"
+        git_ref = "fix_macos_mmap"

     class Builder(MesonBuilder):
         test_options = ["-t", "8"]

and it is still crashing at the same point with:

mmap with flags:2 offest:0 size:80
munmap 4780654592 size:80
pageAlignedOffset (8416296960) is too big
mgautierfr commented 4 months ago

This branch doesn't have traces. So if you have this output you probably not testing the right thing.

kelson42 commented 4 months ago

@mgautierfr Are we not impacted by https://github.com/kiwix/kiwix-build/issues/32? Are we sure that kiwix-build has the HEAD of the proper branch?

mgautierfr commented 4 months ago

@BPerlakiH you have to checkout yourself the right branch (but as you already done it for trace_mmap_macos, I suspect you are good with this)

Are we not impacted by https://github.com/kiwix/kiwix-build/issues/32?

No. A checkout to the right branch and a build (of libzim AND libkiwix (as libkiwix is statically linked to libzim)) and you should be ok (kiwix-build libkiwix --config ... already compile all dependencies of libkiwix by default)

Are we sure that kiwix-build has the HEAD of the proper branch?

If there is no modification of the code (clean working directory) and kiwix-build is configured to get the branch you are on (see https://github.com/kiwix/kiwix-build/commit/2359d0997f44853fd24067322e10cd5ca0a41d83 for example), kiwix-build update the source code automatically

Of course, all this is true if you don't ask to not do it (--skip-source-prepare, --build-nodeps)

mgautierfr commented 4 months ago

@veloman-yunkan please review without waiting confirmation for @BPerlakiH, I'm pretty confident it is a real bug and we should merge it, whatever it fixes the linked issue or not.

BPerlakiH commented 4 months ago

@BPerlakiH you have to checkout yourself the right branch (but as you already done it for trace_mmap_macos, I suspect you are good with this)

@mgautierfr I am not sure about this part. I did as described earlier, changing the git_ref in the file: kiwixbuild/dependencies/libzim.py. I did git clean -dfx before to delete the old build files. I did clean derived data on Xcode, but I am still getting the same results, it is still tracing and still crashing. I tried it 3 or 4 times with different combinations of clearing things first...

Maybe we need a more step by step instructions how to build this the right way, or if you could send me a complete xcframework built with your changes, that would be even easier for me to test.

rgaudin commented 4 months ago

When @mgautierfr mentions a clean state, I believe you need to remove SOURCE/libzim and SOURCE/libkiwix as well. you can check the git branch in those folders post-build to verify which branch was actually used

mgautierfr commented 4 months ago

I have launch a new build for provide you a xcframework.

I am not sure about this part. I did as described earlier, changing the git_ref in the file: kiwixbuild/dependencies/libzim.py.

You should not. The git_ref is used:

If you change the git_ref after a first run, git_ref != current git branch and update is not made (You should have a warning in the output).

You can directly goes into SOURCE/libzim and run git remote update && git pull fix_macos_mmap. If you don't have change git_ref, git_ref != fix_macos_mmap, kiwix-build will not change the source, so you will compile fix_macos_mmap.

Or you can remove SOURCE/libzim, (and update git_ref) so kiwix-build will download again the source and switch to git_ref before compiling.

BPerlakiH commented 4 months ago

Thank you @mgautierfr , I can confirm it is fixing the issue! See below, "Canadian Prepper" now opens correctly:

Screenshot 2024-03-29 at 10 59 37 Medium
mgautierfr commented 4 months ago

And what do you think of introducing a base class for FileReader and MultiPartFileReader that will provide a partial implementation of the get_buffer() method in terms of a new virtual function get_mmapped_buffer()? Such a class can also be a common place for the data members _offset and _size and their accessor methods.

I think well about this. I will do it.

mgautierfr commented 4 months ago

I will do it.

Done. Ready for another review.

mgautierfr commented 3 months ago

LGTM. Rebase&fixup is due.

Done