wahern / cqueues

Continuation Queues: Embeddable asynchronous networking, threading, and notification framework for Lua on Unix.
http://25thandclement.com/~william/projects/cqueues.html
MIT License
244 stars 37 forks source link

starttls issue (unable to update event disposition: No such file or directory) #247

Open luveti opened 3 years ago

luveti commented 3 years ago

This is an example from the documentation, with a snippet from daurnimator's lua-http library, and is the minimal code needed to reproduce the issue I'm seeing in my application.

local ce = require "cqueues.errno"
local cqueues = require "cqueues"
local socket = require "cqueues.socket"

local cq = cqueues.new()

-- copied from https://git.io/JYqpM
local function onerror(socket, op, why, lvl)
    local err = string.format("%s: %s", op, ce.strerror(why))
    if op == "starttls" then
        local ssl = socket:checktls()
        if ssl and ssl.getVerifyResult then
            local code, msg = ssl:getVerifyResult()
            if code ~= 0 then
                err = err .. ":" .. msg
            end
        end
    end
    if why == ce.ETIMEDOUT then
        if op == "fill" or op == "read" then
            socket:clearerr("r")
        elseif op == "flush" then
            socket:clearerr("w")
        end
    end
    return err, why
end

local function send_request()
    local http = socket.connect("google.com", 443)
    http:onerror(onerror)
    local ok, err, errno = http:starttls()
    if not ok then
        -- Note: calling http:close() here causes a different error to occur (Bad file descriptor)
        return nil, err, errno
    end
    http:write("GET / HTTP/1.0\n")
    http:write("Host: google.com:443\n\n")

    local status = http:read()
    print("!", status)
    for ln in http:lines "*h" do
        print("|", ln)
    end

    local empty = http:read "*L"
    print "~"

    for ln in http:lines "*L" do
        io.stdout:write(ln)
    end
    http:close()
end

cq:wrap(function()
    while true do
        print(send_request())
        cqueues.sleep(0.5)
    end
end)

cq:wrap(function()
    while true do
        print(send_request())
        cqueues.sleep(1)
    end
end)

print(cq:loop())

Note: The contents of onerror don't affect this issue, but give a nice error message.

Output:

nil     starttls: Network is unreachable        101
nil     starttls: Network is unreachable        101
nil     starttls: Network is unreachable        101
nil     starttls: Network is unreachable        101
nil     starttls: Network is unreachable        101
nil     starttls: Network is unreachable        101
nil     starttls: Network is unreachable        101
nil     starttls: Network is unreachable        101
nil     starttls: Network is unreachable        101
nil     starttls: Network is unreachable        101
nil     starttls: Network is unreachable        101
false   unable to update event disposition: No such file or directory (fd:28)   2       thread: 0xb6d75648      nil     28

The count of starttls lines varies, from one to about a dozen. Which leads me to believe something is getting garbage collected and a reference to the garbage collected object is being used?

daurnimator commented 3 years ago

This code you linked works for me: it repeatedly (successfully) makes requests to google.com. Is there some other ingredient I need to reproduce?

luveti commented 3 years ago

Hey @daurnimator, looks like I forgot to mention that the device is sitting behind a captive portal, which hasn't been "signed into" yet. I set up a raspberry pi 3 mobile b+ to be a router using RaspAP and Nodogsplash.

I should also mention that I'm running cqueues on a raspberry pi 4 model b (as part of a much larger project). I'm using the latest version of cqueues.

If you need hardware (or funds for some) we would be more than willing to donate. Trying to setup a captive portal on an old router is much harder than doing so on a raspberry pi!

luveti commented 3 years ago

As a temporary work around, I ended up moving all my HTTP requests into threads (using a task queue I've had for a while). This worked pretty well for a while, as I could just let the thread die when the mentioned error occurred.

But after letting this run for a while I started to get "Too many open files" errors from various other places in my program. I ran the following lsof -c luajit | wc -l and noticed luajit was opening more and more files over time. I was able to reproduce this in the above example.

luveti commented 3 years ago

I've been poking around at the internals of cqueues and I'm starting to think there may be a leak somewhere in the dns logic under so_open. If I pass an IP address into socket.connect the issue I've described goes away. Using dns.resolve to resolve the domain name doesn't appear to cause a leak. So an example that works:

local ce = require('cqueues.errno')
local cqueues = require('cqueues')
local dns = require('cqueues.dns')
local packet = require('cqueues.dns.packet')
local record = require('cqueues.dns.record')
local socket = require('cqueues.socket')

local cq = cqueues.new()

local function onerror(socket, op, why, lvl) -- luacheck: ignore 212
    local err = string.format("%s: %s", op, ce.strerror(why))
    if op == "starttls" then
        local ssl = socket:checktls()
        if ssl and ssl.getVerifyResult then
            local code, msg = ssl:getVerifyResult()
            if code ~= 0 then
                err = err .. ":" .. msg
            end
        end
    end
    if why == ce.ETIMEDOUT then
        if op == "fill" or op == "read" then
            socket:clearerr("r")
        elseif op == "flush" then
            socket:clearerr("w")
        end
    end
    return err, why
end

local function domain_to_ip_address(domain)
    local p, err_code = dns.query(domain, 'A')
    if not p then return nil, err_code end
    for r in p:grep({ section = packet.section.ANSWER, type = record.type.A }) do
        return r:addr()
    end
end

local function send_request()
    local ip, err = domain_to_ip_address('google.com')
    if not ip then
        print('failed to resolve domain name', err)
        return
    end

    local http = socket.connect(ip, 443)
    http:onerror(onerror)
    local ok, err, errno = http:starttls()
    if not ok then
        return nil, err, errno
    end
    http:write("GET / HTTP/1.0\n")
    http:write("Host: google.com:443\n\n")

    local status = http:read()
    print("!", status)
    for ln in http:lines "*h" do
        print("|", ln)
    end

    local empty = http:read "*L"
    print "~"

    for ln in http:lines "*L" do
        io.stdout:write(ln)
    end
    http:close()
end

cq:wrap(function()
    while true do
        print(pcall(function()
            print(send_request())
        end))
        cqueues.sleep(1)
    end
end)

print(cq:loop())

While poking around, I noticed defining SOCKET_DEBUG outputs some useful info. It appears socket.connect attempts to connect to both IPv4 and IPv6 addresses using the same file descriptor:

fd = 6
connect(google.com./[172.217.9.78]:443): Connection refused
fd = 6
connect(google.com./[2607:f8b0:4009:816::200e]:443): Network is unreachable
nil     starttls: Network is unreachable        101
true
fd = 6
connect(google.com./[172.217.9.78]:443): Connection refused
fd = 6
connect(google.com./[2607:f8b0:4009:816::200e]:443): Network is unreachable
nil     starttls: Network is unreachable        101
true
fd = 6
connect(google.com./[172.217.9.78]:443): Connection refused
fd = 6
connect(google.com./[2607:f8b0:4009:816::200e]:443): Network is unreachable
nil     starttls: Network is unreachable        101
true
false   unable to update event disposition: No such file or directory (fd:6)    2       thread: 0xb6d15ee8      nil6

NOTE: I've added printf("fd = %i\n", fd); to so_trace in src/lib/socket.c.

I wonder if starttls should even be called if the calls to connect fail?