mattpolzin / OpenAPIKit

Codable Swift OpenAPI implementation.
MIT License
280 stars 35 forks source link

Performance optimizations #232

Closed kean closed 2 years ago

kean commented 2 years ago

Hi,

I'm working with large spec files (100K loc+) where parsing performance becomes a factor. I found two low-hanging fruits that improve the parsing speed by a factor of 5.

kean commented 2 years ago

This doesn't result in a significant improvement in my case, but OrderedDictionary currently creates an array of keys every time contains is called and checks contains in array (O(N)), instead of simply checking unorderedHash (O(1)).

public var keys: [Key] {
    return self.map { $0.0 }
}

public func contains(key: Key) -> Bool {
    return keys.contains(key)
}
mattpolzin commented 2 years ago

Thanks for the suggestions! I definitely support these kinds of improvements and in particular the things you touched on seem like good things to explore further.

Currently progress on the next major version of OpenAPIKit is moving slowly but that said it is a good time to make some of these changes. Supporting opt-in or opt-out vendor extensions would almost certainly surface as a breaking change but I would most likely be ok with that at the next major version if we knew that performance improved substantially for large specs!

mattpolzin commented 2 years ago

To track these ideas,

I am less eager to tackle the last one at the moment with everything else that needs to get done for the next release and less likely to be happy with a solution that isn't both (a) platform agnostic and (b) well understood as robust.

kean commented 2 years ago

Parallelism in decoding

I'm not sure there is any clean way to do that. I came up with a really dirty solution that requires parser to be thread-safe and JSONDecoder isn't, while Yams is. I'm getting significant performance boost though.

Here's the approach I"m using (which, again, I think is pretty terrible), but hey it works.

final class ParallelDocumentParser: Decodable {
    let document: OpenAPI.Document

    init(from decoder: Decoder) throws {
        let container = try decoder.container(keyedBy: CodingKeys.self)

        let version = try container.decode(OpenAPI.Document.Version.self, forKey: .openAPIVersion)
        let info = try container.decode(OpenAPI.Document.Info.self, forKey: .info)

        let group = DispatchGroup()

        var components: Result<OpenAPI.Components, Error>!
        perform(in: group) {
            components = Result(catching: { try container.decodeIfPresent(OpenAPI.Components.self, forKey: .components) ?? .noComponents })
        }

        var paths: Result<OpenAPI.PathItem.Map, Error>!
        perform(in: group) {
            paths = Result(catching: { try container.decode(OpenAPI.PathItem.Map.self, forKey: .paths) })
        }

        group.wait()

        // Skip fields that we don't need for code generation

        self.document = OpenAPI.Document(
            openAPIVersion: version,
            info: info,
            servers: [],
            paths: try paths.get(),
            components: try components.get(),
            security: [],
            tags: nil,
            externalDocs: nil,
            vendorExtensions: [:]
        )
    }

    enum CodingKeys: String, CodingKey {
        case openAPIVersion = "openapi"
        case info
        case paths
        case components
    }
}

The other improvements are already amazing. Without vendor extensions, I'm getting 3-4x performance.