YamlSwift takes over a minute to parse .build/debug.yaml on Linux

jpsim / SourceKitten

An adorable little framework and command line tool for interacting with SourceKit.

MIT License

2.31k stars 226 forks source link

YamlSwift takes over a minute to parse .build/debug.yaml on Linux #289

Closed jpsim closed 7 years ago

jpsim commented 8 years ago

Using Swift 3.0 on Linux, parsing this project's SwiftPM .build/debug.yaml file with YamlSwift takes over 60 seconds. This takes less than a second on macOS.

Here's the sample I'm using to "benchmark".

import Yaml
_ = try! Yaml.load(String(contentsOfFile: CommandLine.arguments[1], encoding: .utf8))

And the yaml file: https://gist.github.com/jpsim/9c5f0ba5840214b136ac72cf7f14a805

Running with callgrind: valgrind --tool=callgrind .build/debug/yamlparser debug.yaml

Looks like 99% of the time is spent in Yaml.Regex.matchRange(_:regex:).

Here's the full graphviz graph (gprof2dot -f callgrind callgrind.out.x | dot -Tsvg -o output.svg):

Related to #287.

jpsim commented 7 years ago

With YamlSwift's latest parsec branch re-implementation (behrang/YamlSwift@parsec), this benchmark now takes 13s on both macOS and Linux:

$ # macOS
$ time .build/debug/Yaml test.yml
.build/debug/Yaml test.yml  12.89s user 0.11s system 99% cpu 13.087 total
$ # Linux
$ time .build/debug/Yaml test.yml
real    0m12.715s
user    0m12.650s
sys 0m0.040s

Obviously, there's still room for improvement, but this is much better. Nice work @behrang!

jpsim commented 7 years ago

Compiling in release mode speeds things up by about ~30%:

$ # macOS
$ time .build/release/Yaml test.yml 
.build/release/Yaml test.yml  8.27s user 0.09s system 98% cpu 8.522 total
$ # Linux
$ time .build/release/Yaml test.yml
real    0m8.720s
user    0m8.640s
sys 0m0.060s

jpsim commented 7 years ago

Unfortunately still very far away from Ruby's YAML module's performance:

$ time ruby -e "require 'yaml'; YAML::load_file('test.yml')"
ruby -e "require 'yaml'; YAML::load_file('test.yml')"  0.10s user 0.03s system 72% cpu 0.186 total

behrang commented 7 years ago

Good news is parsec branch is spec-compliant. Bad news is it is much slower than previous implementation on macOS. Though I wasn't aware of previous version's performance on Linux.

I'll work on its performance. Meanwhile if you got any feedback on other parts of it, I'd be happy to hear it.

jpsim commented 7 years ago

I'll work on its performance.

Happy to hear that!

Meanwhile if you got any feedback on other parts of it, I'd be happy to hear it.

Yes, there are quite a few project-related things that remain to be done, I don't know if you're already tracking this somewhere:

Tests/LinuxMain.swift and Linux tests in general.
podspec and CocoaPods support. This will likely require the same for SwiftParsec.
Xcode project.
Carthage support. This will likely require the same for SwiftParsec.

jpsim commented 7 years ago

I've moved SourceKitten to use libYAML instead in #301. As much as I think YamlSwift is a very cool project, the performance difference really is orders of magnitude, and SourceKitten itself has some pretty lengthy operations that I'll take any performance gain I can get.

behrang commented 7 years ago

Ok. I'll let you know if performance improved.