Closed kean closed 2 years ago
This doesn't result in a significant improvement in my case, but OrderedDictionary
currently creates an array of keys every time contains
is called and checks contains
in array (O(N)), instead of simply checking unorderedHash
(O(1)).
public var keys: [Key] {
return self.map { $0.0 }
}
public func contains(key: Key) -> Bool {
return keys.contains(key)
}
Thanks for the suggestions! I definitely support these kinds of improvements and in particular the things you touched on seem like good things to explore further.
Currently progress on the next major version of OpenAPIKit is moving slowly but that said it is a good time to make some of these changes. Supporting opt-in or opt-out vendor extensions would almost certainly surface as a breaking change but I would most likely be ok with that at the next major version if we knew that performance improved substantially for large specs!
To track these ideas,
OrderedDictionary
no longer creates arrays just to answer questions about when it contains a certain key.I am less eager to tackle the last one at the moment with everything else that needs to get done for the next release and less likely to be happy with a solution that isn't both (a) platform agnostic and (b) well understood as robust.
Parallelism in decoding
I'm not sure there is any clean way to do that. I came up with a really dirty solution that requires parser to be thread-safe and JSONDecoder isn't, while Yams is. I'm getting significant performance boost though.
Here's the approach I"m using (which, again, I think is pretty terrible), but hey it works.
final class ParallelDocumentParser: Decodable {
let document: OpenAPI.Document
init(from decoder: Decoder) throws {
let container = try decoder.container(keyedBy: CodingKeys.self)
let version = try container.decode(OpenAPI.Document.Version.self, forKey: .openAPIVersion)
let info = try container.decode(OpenAPI.Document.Info.self, forKey: .info)
let group = DispatchGroup()
var components: Result<OpenAPI.Components, Error>!
perform(in: group) {
components = Result(catching: { try container.decodeIfPresent(OpenAPI.Components.self, forKey: .components) ?? .noComponents })
}
var paths: Result<OpenAPI.PathItem.Map, Error>!
perform(in: group) {
paths = Result(catching: { try container.decode(OpenAPI.PathItem.Map.self, forKey: .paths) })
}
group.wait()
// Skip fields that we don't need for code generation
self.document = OpenAPI.Document(
openAPIVersion: version,
info: info,
servers: [],
paths: try paths.get(),
components: try components.get(),
security: [],
tags: nil,
externalDocs: nil,
vendorExtensions: [:]
)
}
enum CodingKeys: String, CodingKey {
case openAPIVersion = "openapi"
case info
case paths
case components
}
}
The other improvements are already amazing. Without vendor extensions, I'm getting 3-4x performance.
Hi,
I'm working with large spec files (100K loc+) where parsing performance becomes a factor. I found two low-hanging fruits that improve the parsing speed by a factor of 5.
init(from decoder:)
. Not sure if it's thread-safe, but based on my initial testing, it works great. (Update: I got another 20% improvement by adding even more parser parallelism).