swift-server / async-http-client

HTTP client library built on SwiftNIO
https://swiftpackageindex.com/swift-server/async-http-client/main/documentation/asynchttpclient
Apache License 2.0
907 stars 116 forks source link

Unexpected HEAD request behaviour, perhaps HTTP/2 related? #573

Closed finestructure closed 2 years ago

finestructure commented 2 years ago

We've been observing unexpected behaviour running HEAD requests with HTTPClient against Github.com.

We're sending the requests to probe if a repository exists for a certain URL but all HEAD requests to Github seem to be returning a status code 400.

I'm attaching an example playground with some notes. I'm getting the expected behaviour when running the HEAD request against other sites.

I'm testing redirects in particular here because initially I thought that might be the problem but it seems to affect any HEAD requests to Github.com.

One thing that stands out is that Github.com reports HTTP/2 while the other service I tested is HTTP/1.1.

I wonder if that's relevant?

import AsyncHTTPClient
import Foundation

let client = HTTPClient(eventLoopGroupProvider: .createNew)

// ❯ curl -I "https://github.com/finestructure/arena.git"
// HTTP/2 301
// server: GitHub.com
// date: Thu, 31 Mar 2022 12:58:01 GMT
// content-type: text/html
// content-length: 162
// location: https://github.com/finestructure/arena

do {
    let req = HTTPClient.Request(url: "https://github.com/finestructure/arena.git", method: .HEAD)
    let res = try client.execute(request: req).wait()
    res.status.code  // 400
}

// ❯ curl -I "https://github.com/finestructure/arena"
// HTTP/2 200
// server: GitHub.com
// date: Thu, 31 Mar 2022 14:36:29 GMT
// content-type: text/html; charset=utf-8

do {
    let req = HTTPClient.Request(url: "https://github.com/finestructure/arena", method: .HEAD)
    let res = try client.execute(request: req).wait()
    res.status.code  // 400
}

do {
    var req = URLRequest(url: URL(string: "https://github.com/finestructure/arena.git")!)
    req.httpMethod = "HEAD"
    let task = URLSession.shared.dataTask(with: req) { _, res, error in
        error  // nil
        (res as? HTTPURLResponse)?.statusCode // 200
    }
    task.resume()
}

// ❯ curl -I "http://swiftpackageindex.com"
// HTTP/1.1 301 Moved Permanently
// Date: Thu, 31 Mar 2022 14:30:36 GMT
// Connection: keep-alive
// Cache-Control: max-age=3600
// Expires: Thu, 31 Mar 2022 15:30:36 GMT
// Location: https://swiftpackageindex.com/

do {
    // here it works as expected
    // is this an issue with HTTP/2 for the GitHub.com request?
    let req = HTTPClient.Request(url: "http://swiftpackageindex.com", method: .HEAD)
    let res = try client.execute(request: req).wait()
    res.status.code  // 200
}
Lukasa commented 2 years ago

This appears to be a bug in GitHub's HTTP/2 server implementation. Specifically, for HEAD requests it seems to require that the HEADERS frame has END_STREAM set on it, and forbids receiving HEADERS without END_STREAM followed by a zero-length DATA frame with END_STREAM set on it. You can reproduce this by running this program:

import NIO
import NIOSSL
import NIOHPACK
import NIOHTTP2

// Set this to true to trigger the GitHub bug.
let triggerBug = false

final class PrintIOHandler: ChannelDuplexHandler {
    typealias InboundIn = HTTP2Frame.FramePayload
    typealias OutboundIn = HTTP2Frame.FramePayload
    typealias OutboundOut = HTTP2Frame.FramePayload

    func channelRead(context: ChannelHandlerContext, data: NIOAny) {
        print("Read: \(self.unwrapInboundIn(data))")
    }

    func write(context: ChannelHandlerContext, data: NIOAny, promise: EventLoopPromise<Void>?) {
        print("Write: \(self.unwrapOutboundIn(data))")
        context.write(data, promise: promise)
    }

    func errorCaught(context: ChannelHandlerContext, error: Error) {
        if let error = error as? NIOSSLError, error == .uncleanShutdown {
            // Suppress unclean shutdown
            return
        }

        print("Error: \(error)")
        context.close(promise: nil)
    }
}

func main() {
    let group = MultiThreadedEventLoopGroup(numberOfThreads: 1)
    defer {
        try! group.syncShutdownGracefully()
    }

    let tlsConfiguration = TLSConfiguration.makeClientConfiguration()
    let context = try! NIOSSLContext(configuration: tlsConfiguration)

    let client = try! ClientBootstrap(group: group)
        .channelInitializer { channel in
            let sync = channel.pipeline.syncOperations
            try! sync.addHandler(NIOSSLClientHandler(context: context, serverHostname: "github.com"))
            try! sync.addHandler(NIOHTTP2Handler(mode: .client))
            try! sync.addHandler(HTTP2StreamMultiplexer(mode: .client, channel: channel, inboundStreamInitializer: nil))
            return channel.eventLoop.makeSucceededVoidFuture()
        }.connect(host: "github.com", port: 443).wait()

    try! client.pipeline.handler(type: HTTP2StreamMultiplexer.self).flatMap { http2StreamMultiplexer -> EventLoopFuture<Channel> in
        let p = client.eventLoop.makePromise(of: Channel.self)
        http2StreamMultiplexer.createStreamChannel(promise: p, { channel in channel.pipeline.addHandler(PrintIOHandler()) })
        return p.futureResult
    }.flatMap { childChannel -> EventLoopFuture<Void> in
        let headers = HPACKHeaders([(":method", "HEAD"), (":authority", "github.com"), (":path", "/finestructure/arena"), (":scheme", "https")])

        if triggerBug {
            let firstFrame = HTTP2Frame.FramePayload.headers(.init(headers: headers, endStream: false))
            childChannel.write(firstFrame, promise: nil)

            let secondFrame = HTTP2Frame.FramePayload.data(.init(data: .byteBuffer(ByteBuffer()), endStream: true))
            return childChannel.writeAndFlush(secondFrame).flatMap { childChannel.closeFuture }
        } else {
            let firstFrame = HTTP2Frame.FramePayload.headers(.init(headers: headers, endStream: true))
            return childChannel.writeAndFlush(firstFrame).flatMap { childChannel.closeFuture }
        }
    }.flatMap {
        client.close()
    }.flatMapError { error in
        if let error = error as? NIOSSLError, error == .uncleanShutdown {
            // Suppress unclean shutdown
            return client.eventLoop.makeSucceededVoidFuture()
        } else {
            return client.eventLoop.makeFailedFuture(error)
        }
    }.wait()

    print("Execution complete")
}

main()

If you set reproduceBug to false (the default) you get a valid response from GitHub:

Write: headers(NIOHTTP2.HTTP2Frame.FramePayload.Headers(headers: [([], ":method", "HEAD"), ([], ":authority", "github.com"), ([], ":path", "/finestructure/arena"), ([], ":scheme", "https")], priorityData: nil, endStream: true, _paddingBytes: nil))
Read: headers(NIOHTTP2.HTTP2Frame.FramePayload.Headers(headers: [([], ":status", "200"), ([], "server", "GitHub.com"), ([], "date", "Mon, 04 Apr 2022 15:02:33 GMT"), ([], "content-type", "text/html; charset=utf-8"), ([], "vary", "X-PJAX, X-PJAX-Container, Accept-Encoding, Accept, X-Requested-With"), ([non-indexable], "permissions-policy", "interest-cohort=()"), ([], "etag", "W/\"dea924350e6ccdbf85ac3222c9189556\""), ([], "cache-control", "max-age=0, private, must-revalidate"), ([], "strict-transport-security", "max-age=31536000; includeSubdomains; preload"), ([non-indexable], "x-frame-options", "deny"), ([non-indexable], "x-content-type-options", "nosniff"), ([non-indexable], "x-xss-protection", "0"), ([non-indexable], "referrer-policy", "no-referrer-when-downgrade"), ([non-indexable], "expect-ct", "max-age=2592000, report-uri=\"https://api.github.com/_private/browser/errors\""), ([non-indexable], "content-security-policy", "default-src \'none\'; base-uri \'self\'; block-all-mixed-content; child-src github.com/assets-cdn/worker/ gist.github.com/assets-cdn/worker/; connect-src \'self\' uploads.github.com objects-origin.githubusercontent.com www.githubstatus.com collector.githubapp.com collector.github.com raw.githubusercontent.com api.github.com github-cloud.s3.amazonaws.com github-production-repository-file-5c1aeb.s3.amazonaws.com github-production-upload-manifest-file-7fdce7.s3.amazonaws.com github-production-user-asset-6210df.s3.amazonaws.com cdn.optimizely.com logx.optimizely.com/v1/events translator.github.com *.actions.githubusercontent.com wss://*.actions.githubusercontent.com online.visualstudio.com/api/v1/locations github-production-repository-image-32fea6.s3.amazonaws.com github-production-release-asset-2e65be.s3.amazonaws.com insights.github.com wss://alive.github.com; font-src github.githubassets.com; form-action \'self\' github.com gist.github.com objects-origin.githubusercontent.com; frame-ancestors \'none\'; frame-src render.githubusercontent.com viewscreen.githubusercontent.com notebooks.githubusercontent.com; img-src \'self\' data: github.githubassets.com identicons.github.com collector.githubapp.com collector.github.com github-cloud.s3.amazonaws.com secured-user-images.githubusercontent.com/ *.githubusercontent.com; manifest-src \'self\'; media-src github.com user-images.githubusercontent.com/; script-src github.githubassets.com; style-src \'unsafe-inline\' github.githubassets.com; worker-src github.com/assets-cdn/worker/ gist.github.com/assets-cdn/worker/"), ([], "accept-ranges", "bytes"), ([non-indexable], "x-github-request-id", "F612:22E3:B10CE:C7DB4:624B0889")], priorityData: nil, endStream: true, _paddingBytes: nil))

If you set triggerBug to true you get a 400 instead:

Write: headers(NIOHTTP2.HTTP2Frame.FramePayload.Headers(headers: [([], ":method", "HEAD"), ([], ":authority", "github.com"), ([], ":path", "/finestructure/arena"), ([], ":scheme", "https")], priorityData: nil, endStream: false, _paddingBytes: nil))
Write: data(NIOHTTP2.HTTP2Frame.FramePayload.Data(data: IOData { ByteBuffer { readerIndex: 0, writerIndex: 0, readableBytes: 0, capacity: 0, storageCapacity: 0, slice: _ByteBufferSlice { 0..<0 }, storage: 0x000000010581e280 (0 bytes) } }, endStream: true, _paddingBytes: nil))
Read: headers(NIOHTTP2.HTTP2Frame.FramePayload.Headers(headers: [([], ":status", "400"), ([non-indexable], "x-github-request-id", "F624:741B:DFC34:F769F:624B08DD"), ([non-indexable], "x-frame-options", "DENY")], priorityData: nil, endStream: true, _paddingBytes: nil))

GitHub is in the wrong here, and needs to fix up their server.

finestructure commented 2 years ago

Thanks for much for the detailed investigation, Cory! I've reported the issue with Github (https://support.github.com/ticket/personal/0/1571748 but it seems these are private).

We're currently working around this by using a URLSession instead of HTTPClient and this seems to be working ok. I'm not sure if we're asking for trouble here (just thinking in terms of async and queues that might be involved in addition to ELFs when we call this from a request handler), but it would certainly "feel" nicer if we could stick to the same "mechanism" so to speak.

Since both URLSession and curl seem to be dealing OK with the erroneous Github behaviour I was wondering if there's perhaps a chance to make HTTPClient more lenient, too?

Either way, thanks again for taking the time to look into this and please feel free to close the issue if it's not something that can or should be addressed inside AHC.

Lukasa commented 2 years ago

HTTPClient could do this differently but it's a fairly substantial change to the programming model. Specifically, HTTPClient attempts to use the HTTP/1-to-HTTP/2 transformation layer so that it can speak in terms of a unified HTTP abstraction and rely on swift-nio-http2 to perform the translation. That's fine as it stands, but it causes us a problem with the .end message. Ideally that would put the END_STREAM flag on the previous frame, but if we've forwarded it on already then we aren't holding a frame to do that to.

We can work around this somewhat by adding buffering to the translation layer and optimistically attempting to put END_STREAM on an earlier frame. That would probably fix this issue, assuming HTTPClient is very careful and holds the API exactly correctly (specifically, it must not flush between the write of .head and the write of .end).

finestructure commented 2 years ago

Oof, that does indeed sound like quite the hassle 😅 . It just occurred to me: if the issue is HTTP/2 only, can I somehow force HTTPClient to fall back to HTTP/1 for that call? I can poke around myself a bit but if you've got a pointer that'd be great!

dnadoba commented 2 years ago

Yes, you can disable HTTP/2: https://github.com/swift-server/async-http-client#disabling-http2

finestructure commented 2 years ago

Ah brilliant, thanks a lot! I'll go ahead and close this issue then, thanks again for your help :)