ndmitchell / weeder

Detect dead exports or package imports
BSD 3-Clause "New" or "Revised" License
124 stars 8 forks source link

Optimise .hi file parsing #11

Closed ndmitchell closed 7 years ago

ndmitchell commented 7 years ago

For large projects it takes quite a while (~30 secs). Profiling shows basically all time is spent in the Hi file parsing. Probably move to an alternative string type (foundation?)

ndmitchell commented 7 years ago

Foundation string already gives a win, although since foundation is broken on Windows ghci I don't want to switch yet. Tickets in foundation at https://github.com/haskell-foundation/foundation/issues/326 and https://github.com/haskell-foundation/foundation/issues/332.

ndmitchell commented 7 years ago

I've switched, and it goes 3x faster.

vincenthz commented 7 years ago

Is the next bottleneck in foundation ? And if so, do you have a list of heavy functions ?

ndmitchell commented 7 years ago

It's hard to say where the bottleneck is. Profiling tells me it's validating UTF8 bytes. Removing that code and doing an unsafe convert has zero impact on the time.

ndmitchell commented 7 years ago

I just added a directory str which has various String implementations, and a script go.bat which compiles each in turn and runs a benchmark. For my benchmark, I checked out Shake, did a stack init && weeder --build. I then ran str\go.bat C:\Neil\shake and it gave:

Warnings match
Str-ByteString
Execution time: 1.850 s
Execution time: 1.834 s
Execution time: 1.855 s
Execution time: 1.854 s
Execution time: 1.857 s
Str-Foundation-Unsafe
Execution time: 2.249 s
Execution time: 2.224 s
Execution time: 2.240 s
Execution time: 2.223 s
Execution time: 2.249 s
Str-Foundation
Execution time: 2.231 s
Execution time: 2.232 s
Execution time: 2.243 s
Execution time: 2.232 s
Execution time: 2.239 s
Str-String
Execution time: 4.475 s
Execution time: 4.481 s
Execution time: 4.479 s
Execution time: 4.468 s
Execution time: 4.417 s
Str-Text
Execution time: 2.074 s
Execution time: 2.060 s
Execution time: 2.079 s
Execution time: 2.061 s
Execution time: 2.081 s

Given this is a laptop, I'd just go for the minimum when comparing, which means String < Foundation < Text < ByteString - so Foundation is the worst non [Char] performer. Profiling suggests its UTF8 validation, but Foundation-Unsafe (which just does the conversion without checking) is only fractionally faster. This is using foundation from git HEAD as of 2 minutes ago.

There are a couple of places I've had to write the functions:

I don't really know where the actual bottleneck is.

vincenthz commented 7 years ago

on HEAD you don't need removeR anymore, and I think my latest benchmarks seem to indicate it's better than it was. The windows benchs seems much slower than the linux bench, I'm not sure how to explain this; seems some kind of general haskell slowdown, not foundation related though.

Related to the validation, I think it's just inflated by the insertion of all the SCC when doing profiling, it doesn't seems to be that costly when used without profiling, which make the profile a bit useless to find out what is actually costing the most.

ndmitchell commented 7 years ago

Latest timings:

Str-ByteString
Execution time: 1.022 s
Execution time: 1.012 s
Execution time: 1.010 s
Execution time: 1.010 s
Execution time: 1.010 s
Str-Foundation-Unsafe
Execution time: 1.161 s
Execution time: 1.154 s
Execution time: 1.165 s
Execution time: 1.172 s
Execution time: 1.166 s
Str-Foundation
Execution time: 1.175 s
Execution time: 1.183 s
Execution time: 1.181 s
Execution time: 1.182 s
Execution time: 1.181 s
Str-String
Execution time: 2.403 s
Execution time: 2.399 s
Execution time: 2.391 s
Execution time: 2.412 s
Execution time: 2.467 s
Str-Text
Execution time: 1.088 s
Execution time: 1.085 s
Execution time: 1.081 s
Execution time: 1.082 s
Execution time: 1.080 s

So still worse than Text. As you say, profiling isn't really helping much anymore, so I don't know what to suggest. The Str module is a fair bit simpler now though, see 16a3440c9be163a6828580daf96d39eb94a77fbb.

I guess one option would be to use the very cheap foundation/bytestring conversion, and switch individual functions out for bytestring, which would say which foundation function was the expensive one.

ndmitchell commented 7 years ago

It's now way faster than it was. Considering this one "solved".