ximion / appstream-generator

A fast AppStream metadata generator
GNU Lesser General Public License v3.0
43 stars 29 forks source link

Segfault when processing a pax-format tarball #101

Closed antonio-rojas closed 2 years ago

antonio-rojas commented 2 years ago

appstream-generator segfaults when processing a repo database tarball if it is in pax format, like the attached one, with this backtrace:

#0  0x00007ffff6df866f in find_derivation () at /usr/lib/libc.so.6
#1  0x00007ffff6df8f6c in __gconv_find_transform () at /usr/lib/libc.so.6
#2  0x00007ffff6df7bd6 in __gconv_open () at /usr/lib/libc.so.6
#3  0x00007ffff6df7788 in iconv_open () at /usr/lib/libc.so.6
#4  0x00007ffff774762d in create_sconv_object
    (flag=582, current_codepage=<optimized out>, tc=0x7ffff6f8bb60 <_nl_C_codeset> "ANSI_X3.4-1968", fc=0x7ffff777c91a "UTF-8")
    at libarchive/archive_string.c:1267
#5  get_sconv_object.part.0.lto_priv.0
    (a=0x5555558f7dc0, fc=0x7ffff777c91a "UTF-8", tc=0x7ffff6f8bb60 <_nl_C_codeset> "ANSI_X3.4-1968", flag=<optimized out>)
    at libarchive/archive_string.c:1656
#6  0x00007ffff7736d4c in pax_header (in_as=<optimized out>, in_as=<optimized out>, entry=0x5555558f3a2f, tar=0x5555558f8b30, a=0x5555558f8b30)
    at libarchive/archive_read_support_format_tar.c:1696
#7  header_pax_extensions
    (a=a@entry=0x5555558f7dc0, tar=tar@entry=0x5555558f8b30, entry=entry@entry=0x5555558f8620, h=h@entry=0x55555592d140, unconsumed=unconsumed@entry=0x7ffff6db9e60) at libarchive/archive_read_support_format_tar.c:1512
#8  0x00007ffff7734cfb in tar_read_header
    (a=a@entry=0x5555558f7dc0, tar=tar@entry=0x5555558f8b30, entry=entry@entry=0x5555558f8620, unconsumed=unconsumed@entry=0x7ffff6db9e60)
    at libarchive/archive_read_support_format_tar.c:809
#9  0x00007ffff7735b84 in archive_read_format_tar_read_header (a=0x5555558f7dc0, entry=0x5555558f8620)
    at libarchive/archive_read_support_format_tar.c:544
#10 0x00007ffff7706055 in _archive_read_next_header2 (_a=_a@entry=0x5555558f7dc0, entry=0x5555558f8620) at libarchive/archive_read.c:647
#11 0x00007ffff7706173 in _archive_read_next_header (_a=0x5555558f7dc0, entryp=0x7ffff6db9fd8) at libarchive/archive_read.c:685
#12 0x00005555557ed8c3 in _D5asgen8zarchive19ArchiveDecompressor4readMFZ9__lambda1MFZv () at ../appstream-generator-0.8.7/src/asgen/zarchive.d:404
#13 0x00007ffff71a8c0b in fiber_entryPoint () at /usr/lib/libdruntime-ldc-shared.so.98

Although the backtrace points to a libarchive crash, I am not able to reproduce this by running the same code appstream-generator runs in a separate C program. This works flawlessly on the attached file:

#include <stdlib.h>
#include <archive.h>
#include <archive_entry.h>

int main() {
  struct archive *a;
  struct archive_entry *entry;
  int r;

  a = archive_read_new();
  archive_read_support_filter_all(a);
  archive_read_support_format_all(a);
  r = archive_read_open_filename(a, "core.files.tar.gz", 65536);
  if (r != ARCHIVE_OK)
    exit(1);
  while (archive_read_next_header(a, &entry) == ARCHIVE_OK) {
    const char* pathname = archive_entry_pathname(entry);
    printf("%s\n", pathname);
    archive_read_data_skip(a);
  }
  r = archive_read_free(a);
}

core.files.tar.gz

ximion commented 2 years ago

Weird, this absolutely worked before! I'll try to reproduce this.... Thanks for the sample tarball!

ximion commented 2 years ago

Weird, I can't reproduce this at all with the file you attached. My libarchive version is 3.5.2, asgen was compiled with LDC 1.28.0, are you using different versions?

antonio-rojas commented 2 years ago

Using ldc 1.28.1 and libarchive 3.6.0, but downgrading doesn't make any difference

antonio-rojas commented 2 years ago

Weird, I can't reproduce this at all with the file you attached. My libarchive version is 3.5.2, asgen was compiled with LDC 1.28.0, are you using different versions?

Really weird indeed: I have tested the binaries from Fedora and Debian, including their respective shared libraries down to glibc, and I can always reproduce it.

antonio-rojas commented 2 years ago

Recompiling libarchive without optimization fixes this. Still, I've found no way to reproduce it with pure libarchive code.

Edit: Nevermind, removing optimization actually breaks iconv detection, so libarchive was compiled without the codepath that causes the crash

heftig commented 2 years ago

This seems to be stack exhaustion. This could fix it:

diff -u -r a/src/asgen/zarchive.d a2/src/asgen/zarchive.d
--- a/src/asgen/zarchive.d  2022-02-22 18:16:54.000000000 +0100
+++ a2/src/asgen/zarchive.d 2022-03-27 20:51:22.390736900 +0200
@@ -436,7 +436,7 @@
                 aentry.data = this.readEntry (ar);
                 yield (aentry);
             }
-        });
+        }, 65536);

         return gen;
     }
ximion commented 2 years ago

@heftig That would make quite a bit of sense! @antonio-rojas can you test this?

antonio-rojas commented 2 years ago

Can confirm the fix, thanks!

ximion commented 2 years ago

Brrr, eww! That's an ugly trap set for D's Fibers, but at least it's documented... @heftig do you want to submit a PR to fix this, or should I include a patch? (The thing that I might want to change is storing the stack size in a constant on top of the function and add a comment about this behavior there, so it's clear what the argument does and we have a future reference).

heftig commented 2 years ago

Do what you think is best. This was my first time touching D code.

ximion commented 2 years ago

I can imagine many issues in D code that would be a lot easier to find and easier to debug when starting with the language than this one! Thank you for having a look and finding the issue! :-)