openzim / libzim

Reference implementation of the ZIM specification
https://download.openzim.org/release/libzim/
GNU General Public License v2.0
163 stars 47 forks source link

LibZim 9.1 under macOS throws exception that cannot be caught #866

Closed BPerlakiH closed 3 months ago

BPerlakiH commented 4 months ago

When trying to open a large ZIM file as of https://github.com/kiwix/apple/issues/675

The following exception is throw, which cannot be caught by Objective-C:

Kiwix`zim::(anonymous namespace)::makeMmappedBuffer:
    0x1028083a0 <+0>:   sub    sp, sp, #0x160
    0x1028083a4 <+4>:   stp    x24, x23, [sp, #0x120]
    0x1028083a8 <+8>:   stp    x22, x21, [sp, #0x130]
    0x1028083ac <+12>:  stp    x20, x19, [sp, #0x140]
    0x1028083b0 <+16>:  stp    x29, x30, [sp, #0x150]
    0x1028083b4 <+20>:  add    x29, sp, #0x150
    0x1028083b8 <+24>:  mov    x20, x3
    0x1028083bc <+28>:  mov    x23, x2
    0x1028083c0 <+32>:  mov    x22, x1
    0x1028083c4 <+36>:  mov    x21, x0
    0x1028083c8 <+40>:  mov    w0, #0x1d
    0x1028083cc <+44>:  bl     0x1029fc064               ; symbol stub for: sysconf
    0x1028083d0 <+48>:  neg    x8, x0
    0x1028083d4 <+52>:  and    x19, x8, x23
    0x1028083d8 <+56>:  mov    w8, #0x7fffffff
    0x1028083dc <+60>:  cmp    x19, x8
    0x1028083e0 <+64>:  b.hs   0x102808454               ; <+180>
    0x1028083e4 <+68>:  sub    x23, x23, x19
    0x1028083e8 <+72>:  add    x20, x23, x20
    0x1028083ec <+76>:  mov    x0, #0x0
    0x1028083f0 <+80>:  mov    x1, x20
    0x1028083f4 <+84>:  mov    w2, #0x1
    0x1028083f8 <+88>:  mov    w3, #0x2
    0x1028083fc <+92>:  mov    x4, x22
    0x102808400 <+96>:  mov    x5, x19
    0x102808404 <+100>: bl     0x1029fc310               ; symbol stub for: mmap
    0x102808408 <+104>: cmn    x0, #0x1
    0x10280840c <+108>: b.eq   0x10280847c               ; <+220>
    0x102808410 <+112>: mov    x22, x0
    0x102808414 <+116>: add    x19, x0, x23
    0x102808418 <+120>: str    x19, [x21]
    0x10280841c <+124>: mov    w0, #0x30
    0x102808420 <+128>: bl     0x1029fb560               ; symbol stub for: operator new(unsigned long)
    0x102808424 <+132>: adrp   x8, 2867
    0x102808428 <+136>: add    x8, x8, #0x9b8            ; vtable for std::__1::__shared_ptr_pointer<char*, zim::(anonymous namespace)::makeMmappedBuffer(int, zim::offset_t, zim::zsize_t)::$_0, std::__1::allocator<char>> + 16
    0x10280842c <+140>: stp    x8, xzr, [x0]
    0x102808430 <+144>: stp    xzr, x19, [x0, #0x10]
    0x102808434 <+148>: stp    x22, x20, [x0, #0x20]
    0x102808438 <+152>: str    x0, [x21, #0x8]
    0x10280843c <+156>: ldp    x29, x30, [sp, #0x150]
    0x102808440 <+160>: ldp    x20, x19, [sp, #0x140]
    0x102808444 <+164>: ldp    x22, x21, [sp, #0x130]
    0x102808448 <+168>: ldp    x24, x23, [sp, #0x120]
    0x10280844c <+172>: add    sp, sp, #0x160
    0x102808450 <+176>: ret    
    0x102808454 <+180>: mov    w0, #0x8
    0x102808458 <+184>: bl     0x1029fba88               ; symbol stub for: __cxa_allocate_exception
    0x10280845c <+188>: adrp   x8, 2867
    0x102808460 <+192>: add    x8, x8, #0x990            ; vtable for zim::(anonymous namespace)::MMapException + 16
    0x102808464 <+196>: str    x8, [x0]
    0x102808468 <+200>: adrp   x1, 2867
    0x10280846c <+204>: add    x1, x1, #0x8d8            ; typeinfo for zim::(anonymous namespace)::MMapException
    0x102808470 <+208>: adrp   x2, 0
    0x102808474 <+212>: add    x2, x2, #0x39c            ; zim::(anonymous namespace)::MMapException::~MMapException()
    0x102808478 <+216>: bl     0x1029fb518               ; symbol stub for: __cxa_throw
->  0x10280847c <+220>: add    x0, sp, #0x18
    0x102808480 <+224>: bl     0x1027fb8c0               ; std::__1::basic_ostringstream<char, std::__1::char_traits<char>, std::__1::allocator<char>>::basic_ostringstream[abi:v160006]()
    0x102808484 <+228>: adrp   x1, 567
    0x102808488 <+232>: add    x1, x1, #0x76e            ; "Cannot mmap size "
    0x10280848c <+236>: add    x0, sp, #0x18
    0x102808490 <+240>: mov    w2, #0x11
    0x102808494 <+244>: bl     0x1027f69f0               ; std::__1::basic_ostream<char, std::__1::char_traits<char>>& std::__1::__put_character_sequence[abi:v160006]<char, std::__1::char_traits<char>>(std::__1::basic_ostream<char, std::__1::char_traits<char>>&, char const*, unsigned long)
    0x102808498 <+248>: mov    x1, x20
    0x10280849c <+252>: bl     0x1029fb9ec               ; symbol stub for: std::__1::basic_ostream<char, std::__1::char_traits<char>>::operator<<(unsigned long long)
    0x1028084a0 <+256>: adrp   x1, 567
    0x1028084a4 <+260>: add    x1, x1, #0x780            ; " at off "
    0x1028084a8 <+264>: mov    w2, #0x8
    0x1028084ac <+268>: bl     0x1027f69f0               ; std::__1::basic_ostream<char, std::__1::char_traits<char>>& std::__1::__put_character_sequence[abi:v160006]<char, std::__1::char_traits<char>>(std::__1::basic_ostream<char, std::__1::char_traits<char>>&, char const*, unsigned long)
    0x1028084b0 <+272>: mov    x1, x19
    0x1028084b4 <+276>: bl     0x1029fb9ec               ; symbol stub for: std::__1::basic_ostream<char, std::__1::char_traits<char>>::operator<<(unsigned long long)
    0x1028084b8 <+280>: adrp   x1, 567
    0x1028084bc <+284>: add    x1, x1, #0x5c2            ; " : "
    0x1028084c0 <+288>: mov    w2, #0x3
    0x1028084c4 <+292>: bl     0x1027f69f0               ; std::__1::basic_ostream<char, std::__1::char_traits<char>>& std::__1::__put_character_sequence[abi:v160006]<char, std::__1::char_traits<char>>(std::__1::basic_ostream<char, std::__1::char_traits<char>>&, char const*, unsigned long)
    0x1028084c8 <+296>: mov    x19, x0
    0x1028084cc <+300>: bl     0x1029fc2b0               ; symbol stub for: __error
    0x1028084d0 <+304>: ldr    w0, [x0]
    0x1028084d4 <+308>: bl     0x1029fbee4               ; symbol stub for: strerror
    0x1028084d8 <+312>: mov    x20, x0
    0x1028084dc <+316>: bl     0x1029fbf5c               ; symbol stub for: strlen
    0x1028084e0 <+320>: mov    x2, x0
    0x1028084e4 <+324>: mov    x0, x19
    0x1028084e8 <+328>: mov    x1, x20
    0x1028084ec <+332>: bl     0x1027f69f0               ; std::__1::basic_ostream<char, std::__1::char_traits<char>>& std::__1::__put_character_sequence[abi:v160006]<char, std::__1::char_traits<char>>(std::__1::basic_ostream<char, std::__1::char_traits<char>>&, char const*, unsigned long)
    0x1028084f0 <+336>: mov    w0, #0x10
    0x1028084f4 <+340>: bl     0x1029fba88               ; symbol stub for: __cxa_allocate_exception
    0x1028084f8 <+344>: mov    x19, x0
    0x1028084fc <+348>: add    x8, sp, #0x18
    0x102808500 <+352>: add    x0, x8, #0x8
    0x102808504 <+356>: mov    x8, sp
    0x102808508 <+360>: bl     0x1029fb9a4               ; symbol stub for: std::__1::basic_stringbuf<char, std::__1::char_traits<char>, std::__1::allocator<char>>::str() const
    0x10280850c <+364>: mov    w21, #0x1
    0x102808510 <+368>: mov    x1, sp
    0x102808514 <+372>: mov    x0, x19
    0x102808518 <+376>: bl     0x1029fb6a4               ; symbol stub for: std::runtime_error::runtime_error(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>> const&)
    0x10280851c <+380>: mov    w21, #0x0
    0x102808520 <+384>: adrp   x1, 2830
    0x102808524 <+388>: ldr    x1, [x1, #0x988]
    0x102808528 <+392>: adrp   x2, 2830
    0x10280852c <+396>: ldr    x2, [x2, #0x9c0]
    0x102808530 <+400>: mov    x0, x19
    0x102808534 <+404>: bl     0x1029fb518               ; symbol stub for: __cxa_throw
    0x102808538 <+408>: b      0x102808594               ; <+500>
    0x10280853c <+412>: mov    x20, x0
    0x102808540 <+416>: ldrsb  w8, [sp, #0x17]
    0x102808544 <+420>: tbz    w8, #0x1f, 0x102808558    ; <+440>
    0x102808548 <+424>: ldr    x0, [sp]
    0x10280854c <+428>: bl     0x1029fbaf4               ; symbol stub for: operator delete(void*)
    0x102808550 <+432>: tbnz   w21, #0x0, 0x102808564    ; <+452>
    0x102808554 <+436>: b      0x102808574               ; <+468>
    0x102808558 <+440>: cbnz   w21, 0x102808564          ; <+452>
    0x10280855c <+444>: b      0x102808574               ; <+468>
    0x102808560 <+448>: mov    x20, x0
    0x102808564 <+452>: mov    x0, x19
    0x102808568 <+456>: bl     0x1029fb65c               ; symbol stub for: __cxa_free_exception
    0x10280856c <+460>: b      0x102808574               ; <+468>
    0x102808570 <+464>: mov    x20, x0
    0x102808574 <+468>: add    x0, sp, #0x18
    0x102808578 <+472>: bl     0x1027f4bc0               ; std::__1::basic_ostringstream<char, std::__1::char_traits<char>, std::__1::allocator<char>>::~basic_ostringstream()
    0x10280857c <+476>: b      0x1028085a0               ; <+512>
    0x102808580 <+480>: bl     0x1029fb848               ; symbol stub for: __cxa_begin_catch
    0x102808584 <+484>: mov    x0, x22
    0x102808588 <+488>: mov    x1, x20
    0x10280858c <+492>: bl     0x1029fc1d8               ; symbol stub for: munmap
    0x102808590 <+496>: bl     0x1029fb578               ; symbol stub for: __cxa_rethrow
    0x102808594 <+500>: brk    #0x1
    0x102808598 <+504>: mov    x20, x0
    0x10280859c <+508>: bl     0x1029fb9c8               ; symbol stub for: __cxa_end_catch
    0x1028085a0 <+512>: mov    x0, x20
    0x1028085a4 <+516>: bl     0x1029fc5c8               ; symbol stub for: _Unwind_Resume
    0x1028085a8 <+520>: bl     0x1022c8764               ; __clang_call_terminate
BPerlakiH commented 4 months ago

Seems to be related to this part of the code: https://github.com/openzim/libzim/blob/f6243442dc3ac6f7c79796ce8bd877d0d4f60cf1/src/file_reader.cpp#L150C1-L156C4

kelson42 commented 4 months ago

Crash as well with: ./kiwix-tools_macos-x86_64-3.7.0/kiwix-serve --port=8080 canadianprepper_en_all_2023-11.zim but linux version works fine!

kelson42 commented 4 months ago

@veloman-yunkan Because @mgautierfr is not available for the moment, maybe you have an idea what goes wrong here? ZIM file is not anymore at library.kiwix.org, but available at https://mega.nz/file/DdExnASY#EUnApGBiBNl1rbApKZkGDwy91V15qywiTbTey8FMjDE

mgautierfr commented 4 months ago

It seems that the mmaping is failing for some reason.

The root cause is not obvious. It seems that we are throwing a MMapException which should be catch internally to libzim. But it may be in the catch we fail to allocate the buffer (https://github.com/openzim/libzim/blob/f6243442dc3ac6f7c79796ce8bd877d0d4f60cf1/src/file_reader.cpp#L207)

@BPerlakiH Do you know if we have MMAP_SUPPORT_64 ? (https://github.com/openzim/libzim/blob/f6243442dc3ac6f7c79796ce8bd877d0d4f60cf1/src/file_reader.cpp#L167-L171)

kelson42 commented 4 months ago

Might that be that mmap64 works in general but concretly only until a certain (max) size? https://forum.juce.com/t/memory-mapping-limitations-on-ios/33119

mgautierfr commented 4 months ago

@BPerlakiH Can you test with this libzim version : https://tmp.kiwix.org/ci/dev_preview/trace_mmap_macos/libzim_macos-x86_64-2024-03-18.tar.gz ?

It contains more traces around mmap allocation.

BPerlakiH commented 4 months ago

Thank you @mgautierfr, As a matter of fact to try it out, I will need a new libkiwix build, that includes this libzim version. What is the quickest way to do that ?

mgautierfr commented 4 months ago

Get this one : https://tmp.kiwix.org/ci/dev_preview/trace_mmap_macos/libkiwix_macos-x86_64-2024-03-18.tar.gz

BPerlakiH commented 4 months ago

Managed to compile it via kiwix-build. Here's the crash trace:

mmap with flags:2 offest:0 size:80
munmap 4912545792 size:80
pageAlignedOffset (16696311808) is too big
Screenshot 2024-03-21 at 22 26 04

The variables at the point of exception:

fd  int 22
offset  zim::offset_t   
REAL_TYPEDEF<unsigned long long, zim::offset_t> REAL_TYPEDEF<unsigned long long, zim::offset_t> 
v   unsigned long long  16696312332
size    zim::zsize_t    
REAL_TYPEDEF<unsigned long long, zim::zsize_t>  REAL_TYPEDEF<unsigned long long, zim::zsize_t>  
v   unsigned long long  10924
pageAlignedOffset   const zim::offset_type  16696311808
alignmentAdjustment const size_t    524
mmappedAddress  char *const "\xa0(" 0x000000016f60ec08
*mmappedAddress char    '\xa0'
munmapDeleter   const (unnamed class)   
mmappedAddress  char *const "\f\U00000002"  0x000000016f60eb50
*mmappedAddress char    '\f'
size    zim::zsize_t    
REAL_TYPEDEF<unsigned long long, zim::zsize_t>  REAL_TYPEDEF<unsigned long long, zim::zsize_t>  
v   unsigned long long  4308774440
mgautierfr commented 4 months ago

I think I have found the issue. But are you on 32bits or 64bits ?

mgautierfr commented 4 months ago

Pr #867 should fix it. @BPerlakiH can you test it ?