varnishcache / varnish-cache

Varnish Cache source code repository
https://www.varnish-cache.org
Other
3.68k stars 377 forks source link

umem pointing to unmapped segment #2683

Closed nigoroll closed 5 years ago

nigoroll commented 6 years ago
Panic at: Fri, 18 May 2018 10:58:26 GMT
Wrong turn at cache/cache_main.c:251:
Signal 11 (Segmentation Fault) received at fffffbfedfe44000 si_code 1
version = varnish-6.0.0 revision 929dd59c27383ee344f6b8654afa2c496aadd5e3, vrt api = 7.0
ident = -jsolaris,-sdefault,-sdefault,-hcritbit,ports
now = 798809.357961 (mono), 1526641067.801735 (real)
Backtrace:
  455d15: /usr/local/sbin/varnishd'pan_ic+0x165 [0x455d15]
  4b7937: /usr/local/sbin/varnishd'VAS_Fail+0x47 [0x4b7937]
  451e34: /usr/local/sbin/varnishd'0x51e34 [0x451e34]
  fffffbffef24e5c6: /lib/amd64/libc.so.1'__sighndlr+0x6 [0xfffffbffef24e5c6]
  fffffbffef2411db: /lib/amd64/libc.so.1'call_user_handler+0x1db [0xfffffbffef2411db]
  fffffbffef1ac58f: /lib/amd64/libc.so.1'memcpy+0xa0f [0xfffffbffef1ac58f]
  47ede3: /usr/local/sbin/varnishd'v1f_pull_chunked+0xb3 [0x47ede3]
  447300: /usr/local/sbin/varnishd'VFP_Suck+0x90 [0x447300]
  4488aa: /usr/local/sbin/varnishd'vfp_testgunzip_pull+0x7a [0x4488aa]
  447300: /usr/local/sbin/varnishd'VFP_Suck+0x90 [0x447300]
thread = (cache-worker)
thr.req = 0 {
},
thr.busyobj = 4670ce0 {
  ws = 4670d18 {
    id = \"bo\",
    {s, f, r, e} = {4679c10, +8912, 0, +225448},
  },
  retries = 0, failed = 0, flags = {do_stream, uncacheable, is_gzip},
  http_conn = 467a3f8 {
    fd = 64 (@2ab47f4),
    doclose = NULL,
    ws = 4670d18 {
      [Already dumped, see above]
    },
    {rxbuf_b, rxbuf_e} = {467a458, 467a89f},
    {pipeline_b, pipeline_e} = {467a8a4, 467bde9},
    content_length = -1,
    body_status = chunked,
    first_byte_timeout = 60.000000,
    between_bytes_timeout = 60.000000,
  },
  filters = testgunzip=0 V1F_CHUNKED=0
  director_req = a143e0 {
    vcl_name = *redacted*,
    type = round-robin {
    },
  },
  director_resp = 81fdb0 {
    vcl_name = *redacted*,
    type = backend {
      display_name = boot.*redacted*,
      ipv4 = *redacted*,
      port = 80,
      hosthdr = *redacted*,
      health = healthy,
      admin_health = probe, changed = 1525843409.570737,
      n_conn = 1,
    },
  },
  http[bereq] = 46712e0 {
    ws = 4670d18 {
      [Already dumped, see above]
    },
    hdrs {
      \"GET\",
      \"/*redacted*\",
      [...]
    },
  },
  http[beresp] = 4671758 {
    ws = 4670d18 {
      [Already dumped, see above]
    },
    hdrs {
      \"HTTP/1.1\",
      \"200\",
      \"OK\",
      \"Content-Type: text/html; charset=utf-8\",
      \"Transfer-Encoding: chunked\",
      \"Connection: keep-alive\",
      \"Status: 200 OK\",
      \"Cache-Control: max-age=0, private, must-revalidate\",
      \"ETag: W/\"7aab8f477a80014af02995f2ff5eb4d8\"\",
      \"X-Frame-Options: SAMEORIGIN\",
      \"X-XSS-Protection: 1; mode=block\",
      \"X-Content-Type-Options: nosniff\",
      \"Date: Fri, 18 May 2018 10:57:47 GMT\",
      \"Set-Cookie: *redacted*\",
      \"X-Powered-By: Phusion Passenger 5.0.30\",
      \"Server: nginx/1.10.1 + Phusion Passenger 5.0.30\",
      \"Content-Encoding: gzip\",
      [...]
      \"Vary: Accept-Encoding\",
    },
  },
  objcore[fetch] = 2c08150 {
    refcnt = 3,
    flags = {pass},
    exp_flags = {refd},
    boc = 1e9c930 {
      refcnt = 3,
      state = stream,
      vary = 0,
      stevedore_priv = 0,
    },
    exp = {1526641067.801391, 0.000000, 10.000000, 0.000000},
    objhead = 1e85070,
    stevedore = 5be250 (umem Transient) {
      Simple = fffffbff01935ac0,
      Obj = fffffbffe0498878 {priv=fffffbffe0498870, ptr=fffffbff01935ac0, len=1275, space=1280},
      LEN = 0x0...0,
      VXID = 0x00df18aa,
      FLAGS = 0x02,
      GZIPBITS = 0x0...0,
      LASTMODIFIED = 0x41d6bfac6ac00000,
      VARY = {len=27, ptr=fffffbff01935b38},
      HEADERS = {len=1128, ptr=fffffbff01935b53},
      Body = fffffbffb4576418 {priv=fffffbffb4576410, ptr=fffffbfedfe44000, len=0, space=16384},
    },
  },
  vcl = {
    name = \"boot\",
    busy = 27,
    discard = 0,
    state = auto,
    temp = warm,
    conf = {
      srcname = {
        \"/etc/varnish/default.vcl\",
        \"Builtin\",
        \"*redacted*\",
      },
    },
  },
  vmods = {
    std = {Varnish 6.0.0 929dd59c27383ee344f6b8654afa2c496aadd5e3, 0.0},
    directors = {Varnish 6.0.0 929dd59c27383ee344f6b8654afa2c496aadd5e3, 0.0},
    re2 = {Varnish 6.0.0 929dd59c27383ee344f6b8654afa2c496aadd5e3, 0.0},
  },
},
nigoroll commented 6 years ago

so this is caused in the memcpy in v1f_read, and apparently the stevedore body ptr is the address triggering the segmentation fault.

nigoroll commented 6 years ago

so pmap in the core tells me that in fact the address is not mapped.

FFFFFBFEDFE34000         32K rwx--    [ anon ]  # ends 0xfffffbfedfe3c000
FFFFFBFEDFE4C000         16K rwx--    [ anon ]
nigoroll commented 6 years ago

FTR: As of now, I am quite clueless about this issue. On May 9, we had updated the system from joyent 20160512T071413Z to 20180505T015357Z to get the meltdown fixes. Reviewing https://github.com/joyent/illumos-joyent.git commits, https://smartos.org/bugview/OS-6564 triggered the suspicion that a similar thing could be happening here, but I failed to nail it for now. At leas the current understanding does not suffice for a qualified report.

nigoroll commented 6 years ago
nigoroll commented 6 years ago

ftr, we got this back on the table and are trying to nail down the suspected umem on smartos issue

nigoroll commented 5 years ago

Ref #2697 see https://github.com/varnishcache/varnish-cache/issues/2697#issuecomment-466086334