hypermail-project / hypermail

Hypermail is a free (GPL) program to convert email from Unix mbox format to html.
http://www.hypermail-project.org/
GNU General Public License v2.0
156 stars 73 forks source link

Segfault with hypermail html5 branch on Ubuntu 20.04 #81

Closed outofcontrol closed 2 years ago

outofcontrol commented 2 years ago

Running a fresh build of Hypermail b html5 on Ubuntu 20.04.3 LTS results in the following segfault on a few of our larger lists. On an older VM, the same mbox file works without issue using a much older hypermail version.

Output from Hypermail after attachments have been created, we see a few lines of I18N: invalid multibyte sequence, from UTF-8 to windows-1252 and then: Segmentation fault (core dumped)

syslog:

Jan  9 00:00:49 lists kernel: [12713.069128] hypermail[48676]: segfault at 1 ip 00007fdba7cc7675 sp 00007ffd38ee19d8 error 4 in libc-2.31.so[7fdba7b61000+178000]
Jan  9 00:00:49 lists kernel: [12713.069142] Code: 00 00 0f 1f 00 31 c0 c5 f8 77 c3 66 2e 0f 1f 84 00 00 00 00 00 f3 0f 1e fa 89 f9 48 89 fa c5 f9 ef c0 83 e1 3f 83 f9 20 77 2b <c5> fd 74 0f c5 fd d7 c1 85 c0 0f 85 eb 00 00 00 48 83 c7 20 83 e1

Segfault debugging is not within my skill set. Any one have some suggestions on what might be causing this and how we might get get it working?

jkbzh commented 2 years ago

Hi @outofcontrol Thanks for your report. We can proceed in two ways. If you are familiar with gdb and debugging, I can send you info on how to track the issue. Alternatively, if you can send me a message that is causing the issue, I can do the debugging myself. In order to pinpoint such a message, run hypermail in one of the lists that has the issue. Check the directory where the converted messages are being stored and use "ls | sort -n". Check the last message. If it is complete, then it means the sigsev occurred in the message following it. If it is incomplete, this is the message that has the issue. To find the original message, use the id that is stored as a comment in the message .

outofcontrol commented 2 years ago

I am somewhat familiar with gdb and debugging, and anything I don't know I can probably learn :) Any help you can offer will be appreciated and I would be happy to provide more feedback.

jkbzh commented 2 years ago

ok, so your hypermail should be compiled by default in debug mode. To test this, go to where you put the source code, cd src, and launch gdb with the hypermail binary there. Do a "l main" to check it was compiled with -g. It should show the main function.

If it's not compiled with -g, then in the same src directory, delete the *.o files, edit the Makefile and add -g to the CFLAGS, then recompile. You may also want to remove the optimization flags there (those that start with -o).

Then run it with the parameters you had for creating the list archive that has the sigsev with a "r all_the_options_you_used in_command_line"

When the sigsev happens, you should be able to see which function and line has the issue. you may need to go back up some frames if it happened inside one of the libraries. The issue is happening with the call to iconv.

outofcontrol commented 2 years ago

This is really helpful, thank you for helping with this. It will take a couple of hours to get here, as I will need to mirror our setup to a dev server before doing this. Once I have some output, I will post here.

outofcontrol commented 2 years ago

Complete output apart from the normal hypermail verbose output:

Program received signal SIGSEGV, Segmentation fault.
__strlen_avx2 () at ../sysdeps/x86_64/multiarch/strlen-avx2.S:65
65  ../sysdeps/x86_64/multiarch/strlen-avx2.S: No such file or directory.
outofcontrol commented 2 years ago

using backtrace full to get a bit more, it points to trio I believe:

#0  __strlen_avx2 () at ../sysdeps/x86_64/multiarch/strlen-avx2.S:65
No locals.
#1  0x00005555555968e3 in trio_length (string=0x1 <error: Cannot access memory at address 0x1>) at triostr.c:368
No locals.
#2  TrioWriteString (self=0x7fffffff8070, string=0x1 <error: Cannot access memory at address 0x1>, flags=786432, width=0, precision=0) at trio.c:2770
        length = <optimized out>
        ch = <optimized out>
#3  0x000055555559904f in TrioFormatProcess (data=data@entry=0x7fffffff8070, format=format@entry=0x555558a42f70 "printf(\"%.*s\\n\", int(sv.size()), sv.data());<br>\n",
    parameters=parameters@entry=0x7fffffff80b0) at trio.c:3859
        i = 1
        string = <optimized out>
        pointer = <optimized out>
        flags = <optimized out>
        width = <optimized out>
        precision = <optimized out>
        base = <optimized out>
        offset = <optimized out>
#4  0x000055555559948c in TrioFormat (destination=<optimized out>, destinationSize=destinationSize@entry=0, OutStream=OutStream@entry=0x555555594090 <TrioOutStreamFile>,
    format=0x555558a42f70 "printf(\"%.*s\\n\", int(sv.size()), sv.data());<br>\n", arglist=arglist@entry=0x7fffffffe0f0, argfunc=argfunc@entry=0x0, argarray=0x0)
    at trio.c:3985
        status = <optimized out>
        data = {OutStream = 0x555555594090 <TrioOutStreamFile>, InStream = 0x0, UndoStream = 0x0, location = 0x5555687cf0f0, current = 0, processed = 8, actually = {
            committed = 8, cached = 8}, max = 0, error = 0}
        parameters = {{type = 7, flags = 0, width = 0, precision = -1, base = 10, baseSpecifier = -1, varsize = -1, beginOffset = 10, endOffset = 6, position = 0, data = {
              string = 0x0, pointer = 0x0, number = {as_signed = 0, as_unsigned = 0}, doubleNumber = 0, doublePointer = 0x0, longdoubleNumber = 0, longdoublePointer = 0x0,
              errorNumber = 0}, user_defined = {
              namespace = "\000\000\000\000\000\000\000\000|\000\000\000w\000\000\000n\000\000\000[\000\000\000\000R\236\275.\270\324\000\000\000\000\000\377\377\377\377@\000\000\000\000\000\000\000\000\376|hUU\000\000P\000\000\000\000\000\000", handler = 0},
.
.
.
#5  0x00005555555996c3 in trio_fprintf (file=file@entry=0x5555687cf0f0, format=<optimized out>) at trio.c:4277
        status = <optimized out>
        args = {{gp_offset = 32, fp_offset = 48, overflow_arg_area = 0x7fffffffe1d0, reg_save_area = 0x7fffffffe110}}
#6  0x000055555557cdf9 in printbody (fp=fp@entry=0x5555687cf0f0, email=0x555558a428d0, maybe_reply=maybe_reply@entry=0, is_reply=is_reply@entry=1) at print.c:1509
        insig = <optimized out>
        inblank = 0
        bp = 0x555558a43a40
        id = 0x555558a43cb0 "20171107233835.5156944.98799.39868_at_[hidden]"
        subject = 0x555558a44b40 "Re: [isocpp-lib] P0555 string_view for source_location"
        msgnum = <optimized out>
        body_start_attribute = 0x5555555a7d83 " id=\"start\""
        inheader = 0 '\000'
        body_start = 0
        pre_open = 0
        showhtml_open = 0
        inlinehtml_open = 1
        attachment_open = 0
        inquote = <optimized out>
        quote_num = <optimized out>
        quoted_percent = <optimized out>
        replace_quoted = 0
#7  0x000055555557e583 in writearticles (startnum=<optimized out>, maxnum=11236) at print.c:2582
        filename = 0x5555687cf7a0 "lib.mbox/2017/11/2427.php"
        num = 2427
        skip = 0
        newfile = <optimized out>
        is_reply = 1
        email = 0x555558a428d0
        email_next_in_thread = <optimized out>
        bp = <optimized out>
        rp = <optimized out>
        fp = 0x5555687cf0f0
        ptr = <optimized out>
        localsubject = 0x55556877b090 "Re: [isocpp-lib] P0555 string_view for source_location"
        localname = 0x5555687ce410 "Tony V E"
        convlen = 8
        gp = 0x5555686fa420
#8  0x000055555556cb24 in main (argc=<optimized out>, argv=<optimized out>) at hypermail.c:699
        i = <optimized out>
        use_stdin = 0
        configfile = 0x5555555e00d0 "/home/hyper-archives/.hmrc"
        tlang = <optimized out>
        locale_code = 0x55555559e393 "en_US"
        cmd_show_variables = 0
        print_usage = <optimized out>
        amount_old = 0
        amount_new = 11236
cmeerw commented 2 years ago

@outofcontrol in src/print.c you'll want to change line 1509 from

fprintf(fp, bp->line);

to

fprintf(fp, "%s", bp->line);
outofcontrol commented 2 years ago

This resolves the segfault. Thank you.