Closed rj93 closed 1 year ago
there are two reasons why this may be slow:
let trimmed = line.trimmingCharacters(in: .whitespaces)
this is not necessary. the Root
rule skips leading whitespace anyway, and the foundation trimmingCharacters(in:)
copies the string to new storage, which is wasteful.
if you know the root message is an object (and not an array), it may also be slightly faster to use the Object
rule.
let log: Log = try .init(from: decoder)
this is probably the main reason why your decoding is slow, because it is using the standard library Codable
hooks, which use existential dispatch. JSONDecoder
also does this, but the reason swift-json
can be faster than JSONDecoder
is because it allows you to bypass the Codable
interface by dropping down to the LintingDictionary
level. (this usually gives like a 10x speedup)
alas i have not gotten a chance to write tutorials for LintingDictionary
(it is a v0.3.0 API), but if you can share the format of your JSON messages, i’d be happy to point you in the right direction.
by the way, Grammar.parse(trimmed.utf8, as: JSON.Rule<String.Index>.Root.self)
doesn’t really have “initialization overhead”, and swift-json
doesn’t really “create” decoders the way Foundation does. however, creating the Decoder wrapper is rather expensive, which is why you usually want to decode with LintingDictionary
. (or even switch on JSON
variants directly, for really bare-metal performance)
finally, would you mind sharing your test data (and the toolchain version used to compile the test program)? a 4x slowdown seems extreme, even if you're using the Codable
wrappers.
Thanks for the quick reply @kelvin13.
I tried removing let trimmed = line.trimmingCharacters(in: .whitespaces)
, let log: Log = try .init(from: decoder)
and updated it to use Object
rule as follows, but it still took 85 seconds:
let (bytes, urlResponse) = try! await URLSession.shared.bytes(from: file)
var logs: [Log] = []
do {
let startTime = CFAbsoluteTimeGetCurrent()
for try await line in bytes.lines {
if !line.isEmpty {
let decoder = try Grammar.parse(line.utf8, as: JSON.Rule<String.Index>.Object.self)
}
}
let timeElapsed = CFAbsoluteTimeGetCurrent() - startTime
print("Time elapsed: \(timeElapsed) s.")
} catch {
print("Reading file failed with error \(error)")
}
Side note, how would you create a Codable
struct from the [(key: String, value: JSON)]
type when you use the Object
rule? (Sorry if this is basic, I'm pretty new to swift)
I am unfortunately not able to share the file as it contains sensitive data, if you need one created I can do so, but for now heres an example file:
{"@timestamp":"2000-01-01T00:00:00.000+00:00","message":"An example debug message 1","logger_name":"com.example.ExampleLogger","thread_name":"main","level":"DEBUG"}
{"@timestamp":"2001-01-01T00:00:00.000+00:00","message":"An example info message 2","logger_name":"com.example.ExampleLogger","thread_name":"main","level":"INFO"}
{"@timestamp":"2002-01-01T00:00:00.000+00:00","message":"An example warn message 3","logger_name":"com.example.ExampleLogger","thread_name":"main","level":"WARN"}
{"@timestamp":"2003-01-01T00:00:00.000+00:00","message":"An example error message 4","logger_name":"com.example.ExampleLogger","thread_name":"main","level":"ERROR"}
{"@timestamp":"2004-01-01T00:00:00.000+00:00","message":"An example fatal message 5","logger_name":"com.example.ExampleLogger","thread_name":"main","level":"FATAL"}
{"@timestamp":"2005-01-01T00:00:00.000+00:00","message":"An example message 6","logger_name":"com.example.longer.ExampleLogger","thread_name":"main","level":"INFO"}
and the toolchain version used to compile the test program
I'm not sure if this is what you're looking for:
swift-driver version: 1.45.2 Apple Swift version 5.6.1 (swiftlang-5.6.0.323.66 clang-1316.0.20.12)
Target: arm64-apple-macosx12.0
Also 0.3.0 doesn't seem to exist as a tag, but it's in the README so not sure if something has broken there?
I tried removing
let trimmed = line.trimmingCharacters(in: .whitespaces)
,let log: Log = try .init(from: decoder)
and updated it to useObject
rule as follows, but it still took 85 seconds:
it is very strange that is is taking 85 seconds even with no decoding taking place. can you please regenerate the input data so i can investigate this further?
by the way, you are using CFAbsoluteTimeGetCurrent
, which is a wall-clock time, and you are running the timer across a suspension boundary (the await
in the asynchronous loop). this means that you are not only measuring the execution time of the JSON decoders, but also arbitrary tasks run by the scheduler in between loop iterations.
i don’t know why this is affecting swift-json
more than the Foundation implementation, but for the sake of getting an accurate measurement, can you please re-run the benchmarks in a synchronous loop?
also, i wouldn’t normally ask this, but you said you are new to swift, so are you compiling in production mode? (swift build -c release
)? this is very important to swift-json
and other SPM packages that are distributed as source, because they use the same optimization settings as the rest of the project. (unlike Foundation, which is distributed as a binary.)
Side note, how would you create a Codable struct from the [(key: String, value: JSON)] type when you use the Object rule?
the construction returned by the Object
rule is just a normal swift array, and you can operate on it like any other swift array. however, it’s probably easier (and you would get better error-handling) if you use LintingDictionary
like this:
init(from json:JSON) throws
{
(self.timestamp, self.message, self.logger, self.thread, self.level) =
try json.lint
{
(
timestamp: try $0.remove("@timestamp", as: String.self),
message: try $0.remove("message", as: String.self),
logger: try $0.remove("logger_name", as: String.self),
thread: try $0.remove("thread_name", as: String.self),
level: try $0.remove("level") { try $0.case(of: LoggingLevel.self) },
)
}
}
where LoggingLevel
is some RawRepresentable
type, which i am using with the JSON.case(of:)
API.
enum LoggingLevel:String
{
case info = "INFO"
case warn = "WARN"
case debug = "DEBUG"
...
}
the easiest way to create the LintingDictionary
is just to widen the object to a JSON
variant (with the JSON.object(_:)
case constructor) and then call one of its lint(_:)
methods, but you can also dodge a miniscule amount of overhead by converting to a dictionary yourself, and passing it to LintingDictionary.init(_:)
Also 0.3.0 doesn't seem to exist as a tag, but it's in the README so not sure if something has broken there?
the README is correct, it shows the next planned release, which is 0.3.0, and does not have a tag yet. “0.3.0” right now simply refers to the current master
trunk of the repository.
Apologies for the delayed reply, been trying to figure out how to make the build work but I've done it now.
You can find a sample application at https://github.com/rj93/json-decode
You were right though, swift-json
's performance improves when built using the swift build -c release
command, previously I was just running it from xcode. But the difference is tiny (less than half a second between them).
I also tried the 0.3.0 API, but it didn't make any noticeable improvements
Any pointers to how to speed that up in my use case?
it sounds like you were already compiling in release mode then, since going from debug to release mode should cause a way bigger difference than half a second.
based on what you described earlier, the performance problem is most likely not in the decoding stage, so switching to the 0.3.0 API would not have made a big difference, since that wasn’t the bottleneck.
based on what you have tried and what you’ve described, this is probably a bug in swift-json. i will try to reproduce it and investigate this further with the example project you linked. thanks for the report!
hi @rj93 , i was not able to reproduce your results on linux, in fact swift-json
was about 28% faster than the Foundation implementation, even when using the default standard library Decodable
hooks:
$ .build/release/json-decode aws-ipamd.jsonl
fileUrl: aws-ipamd.jsonl -- file:///home/klossy/dev/json-decode-test/
Starting foundation read
foundation time elapsed: 12.463001012802124 s.
Starting swift-json read
swiftJson time elapsed: 8.935490012168884 s.
$ swift --version
Swift version 5.8-dev (LLVM e41a0a126d143ae, Swift 23e5143d22cb768)
Target: x86_64-unknown-linux-gnu
$ swiftenv version
DEVELOPMENT-SNAPSHOT-2022-07-25-a
the only change i made to the example program you provided was adding a import CoreFoundation
to the top, to make it compile on linux. though it is extremely unlikely this has any effect on performance.
i will investigate adding your test data to a CI workflow, to see if there are any potential platform differences that could be causing the performance discrepancy.
update: i have added your test case to the project’s CI benchmarks here:
https://github.com/kelvin13/swift-json/runs/7586938970?check_suite_focus=true#step:7:64
i was also able to significantly optimize your Foundation JSONDecoder
usage, which brought the Foundation decoding time down from around 11s to around 6.7s.
swift-json decoding time (fast decoding api): 3.649686595 seconds
swift-json decoding time (compatibility api): 4.506052616 seconds
foundation decoding time: 6.677785956 seconds
i will keep this issue open for now, as it is possible this only affects your particular macOS/Xcode configuration, and we do not yet have CI workflows set up to run benchmarks on macOS.
@kelvin13 I checked out project, updated a few things to make it run on macOS (removed the timing functions and used the CFAbsoluteTimeGetCurrent
as thats only available in the current macOS beta, along with adding in some @available(macOS 11.0, *)
annotations in the aws benchmark and your swift-system-extras
library) and I got the following results:
swift run -c release benchmark-aws-ipam ./data/aws-ipamd.jsonl
Building for production...
Build complete! (0.07s)
swift-json: decoded 415901 logs
swift-json: decoded 415901 logs
foundation: decoded 415901 logs
swift-json decoding time (fast decoding api): 2.2130579948425293
swift-json decoding time (compatibility api): 2.8436840772628784
foundation decoding time: 18.538437962532043
So this can probably be closed as it is pretty evident that swift-json
is significantly quicker.
Thanks for taking the time to look into it.
I tried implementing exactly what was in the benchmark code and it's still slower than the foundation implementation, so it must be something that I'm doing which is very wrong from with within my app.
I have a JSON line file and I'm trying to parse it, I'm wondering if this can be used?
Currently
swift-json
is ~4 times slower than the Foundation implementation, but I suspect this is because I am recreating the decoder each time:swift-json (~97 seconds)
Foundation (~24 seconds)
Tested on a ~100MB file with 140000 lines