PallHaraldsson commented 4 years ago

A bit faster, but much faster possible:

julia> @time using Memento
  3.192507 seconds (5.72 M allocations: 300.751 MiB, 6.93% gc time)

from:
julia> @time using Memento
  3.371172 seconds (5.75 M allocations: 302.979 MiB, 6.73% gc time)

Much better possible (but not with a PR change):
$ julia -O0 --compile=min -q
julia> @time using Memento
  0.954586 seconds (1.28 M allocations: 71.635 MiB, 1.95% gc time)

$ julia -O0 -q
julia> @time using Memento
  2.393272 seconds (5.72 M allocations: 300.751 MiB, 9.47% gc time)

codecov[bot] commented 4 years ago

Codecov Report

Merging #163 into master will decrease coverage by 14.25%. The diff coverage is n/a.

@@             Coverage Diff             @@
##           master     #163       +/-   ##
===========================================
- Coverage   96.99%   82.74%   -14.26%     
===========================================
  Files          12       13        +1     
  Lines         333      394       +61     
===========================================
+ Hits          323      326        +3     
- Misses         10       68       +58

Impacted Files	Coverage Δ
src/Memento.jl	`75.00% <ø> (-25.00%)`	:arrow_down:
src/filters.jl	`66.66% <0.00%> (-33.34%)`	:arrow_down:
src/config.jl	`81.25% <0.00%> (-18.75%)`	:arrow_down:
src/records.jl	`74.28% <0.00%> (-18.58%)`	:arrow_down:
src/loggers.jl	`80.88% <0.00%> (-18.22%)`	:arrow_down:
src/handlers.jl	`83.87% <0.00%> (-16.13%)`	:arrow_down:
src/io.jl	`84.00% <0.00%> (-16.00%)`	:arrow_down:
src/formatters.jl	`92.59% <0.00%> (-5.41%)`	:arrow_down:
src/stdlib.jl	`72.72% <0.00%> (-5.06%)`	:arrow_down:
... and 2 more

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update 6051407...a47953b. Read the comment docs.

PallHaraldsson commented 4 years ago

I was looking into the slowness of loading LibPQ.jl, and this package was a big offender.

Compared to (as is on Julia 1.6 on default settings, without changing these packages):

julia> @time using Logging
  0.000391 seconds (727 allocations: 42.812 KiB)

julia> @time using MiniLogging
  0.046572 seconds (31.69 k allocations: 2.776 MiB)

julia> Pkg.add("Lumberjack")  # here Julia 0.6:
julia> @time using Lumberjack
  0.444273 seconds (166.21 k allocations: 9.450 MiB)

$ ~/julia-0.6-9d11f62bcb/bin/julia -O0 --compile=min -q
julia> @time using Lumberjack
  0.310134 seconds (166.21 k allocations: 9.514 MiB)

It's seems like in your port, the "flexibility" (I assume new features) is costly, because of new features. I didn't look into if there's AbstractLogging.jl, but maybe it should exists, or how compatible the APIs are. If startup can't be improved (more) could choose-able logging be done (in e.g. LibPQ)?

rofinn commented 4 years ago

Yes, Memento provides more flexibility for fine-tuning your logging across larger applications which require hundreds of packages/modules. In our production systems, we often need to turn on debug logging for very specific packages/modules (or chains of them). AFAIK, none of the current solutions provide that kind of flexibility and because our applications run for long periods of time the startup cost is a reasonable compromise for us. I don't think disabling optimization is the right approach here because then you're just pushing that startup cost to the runtime. I think a better approach might be to look at what's changed in the Julia recently that might allow us to improve type stability, compilation time and performance overall, while maintaining the features that distinguish this package from some other alternatives. Alternatively, we could put some work into gradually adding the desired features from this package into base in some way... but that'll likely require more work, negotiations and compromises.

PallHaraldsson commented 4 years ago

I don't think disabling optimization is the right approach here because then you're just pushing that startup cost to the runtime.

I just assumed this code isn't speed-critical, since it's "just" logging, right? I/O bound? Maybe I overlooked something, at least -O0 isn't too slow, -O1 is also a middle-ground for you to consider.

I realize you're a company (and maybe not too affected by the startup), I'm more concerned for your LibPQ.jl, where it (mostly, but not just, because of Memento), has slow startup (compared to wrapper drivers for other databases), maybe a bit annoying for many users.

Any idea how much work it's to migrate from Memento for it? For my usecase, I only need trivial logging if any.

rofinn commented 4 years ago

I just assumed this code isn't speed-critical, since it's "just" logging, right? I/O bound? Maybe I overlooked something, at least -O0 isn't too slow, -O1 is also a middle-ground for you to consider.

Yes, for logs that are actually emitted the code would be I/O bound, but what about cases where the logs shouldn't be emitted based on log level or handler? Base logging avoids this problem with macros that turn your code into no-ops, while Memento uses lazy evaluation with more flexible filters that may not be known at compile time.

Any idea how much work it's to migrate from Memento for it? For my usecase, I only need trivial logging if any.

Memento can hook into the base logging with a few caveats.

https://invenia.github.io/Memento.jl/latest/faq/config-recipes/#Integration-with-Base-Julia-Logging-1

The application code needs to enable substitution of the base logger. It's pretty easy, and we do that to catch log messages from external packages we don't control.
Memento can only filter messages sent directly through it or emitted by the base logging system, so if you want debug message then you'll need to do that through the base environment variable system... which we don't want to do for our own packages. https://docs.julialang.org/en/v1/stdlib/Logging/#Environment-variables-1.

If you'd still like to move forward with trying to make that change to LibPQ.jl then I'd recommend making an issue there and one of our maintainers for that package can decide if it's worth it. Moving forward here, I'd recommend that we open an issue to look into what's slowing down the load time and see if there's anything we can do about it more directly. I suspect that it might be tied to the __init__ function and our current global logging hierarchy.

rofinn commented 4 years ago

FWIW, the reason Logging is so fast is because it's baked into the julia system image, so I'm not sure that's a fair comparison.

rofinn commented 4 years ago

I suspect that it might be tied to the init function and our current global logging hierarchy.

Yep, looks like it's because we're actually running some code during loading. Without the __init__ function commented out I get:

julia> @time using TimeZones
  0.104455 seconds (104.01 k allocations: 7.856 MiB)

julia> @time using Memento
  1.289233 seconds (3.60 M allocations: 186.269 MiB, 3.61% gc time)

and with it commented out I get:

julia> @time using TimeZones
  0.129145 seconds (104.01 k allocations: 7.856 MiB)

julia> @time using Memento
  0.198040 seconds (203.01 k allocations: 12.159 MiB)

NOTE: These are first load times, but after precompilation.

PallHaraldsson commented 4 years ago

I would still time with lower optimization level (at least -O1 but only for the package, not globally, then), as I've seen it as fast at runtime (just faster startup). I'm just not sure I'm the best person to time that, as I'm not sure where to start. Possibly you have some code to try out on?

rofinn commented 4 years ago

Okay, I think the better approach might be to use SnoopCompile.jl to generate a set of precompile statements that'll improve load times. Ideally, this'll push the issue to a package build/precompile time problem and we don't have to worry about losing runtime performance. I should have a PR shortly.

rofinn commented 4 years ago

Okay, looks like I may need to add some similar precompile statements to TimeZones.jl

rofinn commented 4 years ago

Alright, I wasn't sure how to fix TimeZones.jl directly, so I've just opted to cache the output of localzone for now. With my first pass in #164 I was able to get the allocations close to what you're ideal case was above.

Much better possible (but not with a PR change):
$ julia -O0 --compile=min -q
julia> @time using Memento
  0.954586 seconds (1.28 M allocations: 71.635 MiB, 1.95% gc time)

vs

julia> @time using Memento
  0.672002 seconds (1.13 M allocations: 64.016 MiB, 2.14% gc time)

NOTE: It's probably better to look at the allocations rather than exact times here

Apart from redesigning Memento's architecture I'm not sure how much better we can do because there's a certain amount that Memento needs to do at load time in terms of setting up the logger hierarchy.

rofinn commented 4 years ago

Alright, I've merged and tagged that PR if you want to give it a spin.

PallHaraldsson commented 4 years ago

The timing on my machine is a little noisy, but it seems I get just as fast (min) loading speed with -O1 as -O0, under half a sec better on v1.1.1.

Feel free to test the speed with my PR, or rather modified to -O1, or just close this PR, as you've done a good job already. I could finally install, but not after some problems (at Genie or other package, still unresolved): https://github.com/GenieFramework/Genie.jl/issues/269

There's some new stuff on Julia master to help with package loading, and my PR here could use it, but I'm not sure it has landed, at least will be in 1.6 earliest (or some 1.6-DEV?).

PallHaraldsson commented 4 years ago

Feel free to just close issue, as on Julia 1.6 master (w/o PR), not too bad (or merge with (updated PR for) less conservative -O1, which is probably fast at runtime):

julia> @time using Memento
  1.133228 seconds (1.12 M allocations: 69.125 MiB)

$ julia -O1 -q
julia> @time using Memento
  0.691552 seconds (1.12 M allocations: 69.125 MiB)

$ julia -O0 --compile=min -q
julia> @time using Memento
  0.290336 seconds (339.35 k allocations: 23.650 MiB)

FYI: This is on latest Memento@v1.1.1 but for its master seems a bit slower...

rofinn commented 4 years ago

I've asked around for someone to clarify what the optimization levels actually do. Once I have an idea of what this is likely to change during runtime then I'll write some benchmarks and compare. It could be that setting it to 0 doesn't make a difference 99% of the time.

PallHaraldsson commented 4 years ago

Jeff wrote that it only changes LLVM optimization pass. Julia however does some optimization before it gets to that. Julia has also --inline={yes|no} and even -O0 doesn't imply no inlining.

iamed2 commented 4 years ago

There are more per-module options coming in 1.6: https://github.com/JuliaLang/julia/pull/37041

PallHaraldsson commented 4 years ago

Yes, thanks, I saw that, but lost the issue number, and was actually going to find it back. FYI: The stuff in the PR already (and I assume, also with those new options), are not as effective as I would like. If you put it in your module, one or more dependencies may be the real reason for the slowness of loading, I think in your case, and adding this stuff in your module has I think no effect on sub-modules (also unclear you would want that).

That's why I recently added to JLLs this option, and they can have huge trees of dependencies of JLL and I added to those to (70 PRs), and that still seemed like a failed experiment (vs. starting with the global option). I didn't however change all the modules that did have the JLLs as dependencies.

rofinn commented 4 years ago

Hmmm, I don't think -O on the cmdline is equivalent here. I tried the three different optimization levels that we can control for here and this is what I get.

# 1.1.1 release
julia> @time using Memento
  0.659624 seconds (1.38 M allocations: 83.938 MiB, 2.40% gc time)

# optlevel = 1
julia> @time using Memento
  0.600229 seconds (1.39 M allocations: 84.014 MiB, 2.47% gc time)

# optlevel = 0
julia> @time using Memento
  0.596517 seconds (1.39 M allocations: 84.061 MiB, 2.45% gc time)

That seems like a pretty minimal difference. I think a better approach going forward might be to re-architect Memento to focus more on zero-cost abstractions. For example:

Improve performance of logger hierarchy
Don't initialize the logger hierarchy on package load
Use macros to turn non-emitting logging calls into no-ops. This would reduce most performance implications of setting the optlevel.

PallHaraldsson commented 4 years ago

I agree. However, I noticed on brand-new master down to:

$ julia -O1 --compile=min -q
julia> @time using Memento
  0.268449 seconds (259.62 k allocations: 18.581 MiB)

vs both below on defaults:
julia> @time using Memento
  1.110142 seconds (1.04 M allocations: 64.018 MiB, 1.08% gc time)

julia> @time using MiniLogging
  0.013191 seconds (11.16 k allocations: 872.849 KiB)

I tried to get the former time, with the new capabilities, then I got lower (because of additional new "infer" option):

$ julia -O1 --compile=min -q
julia> @time using Memento
  0.197868 seconds (181.31 k allocations: 13.584 MiB)

while default almost unchanged:
julia> @time using Memento
  1.028738 seconds (1.03 M allocations: 63.636 MiB, 0.94% gc time)

I'm not saying we should use these new options, even if they would work better, I think we could get closer to those minimal times with rearchitecting.

PallHaraldsson commented 4 years ago

JSON.jl's startup delay isn't too bad (nor is JSON2 or JSON3, that load almost as fast), but I believe all the complexity of those packages relates to JSON parsing, with writing JSON rather trivial (so we could do without a package and still write JSON?).

Do we even need JSON logging? I see https://github.com/JuliaLang/TOML.jl (now a standard lib) loads faster.

I prefer to read TOML, and there are code filters to convert; and also online: https://pseitz.github.io/toml-to-json-online-converter/

rofinn commented 4 years ago

I'm having difficulty parsing your first sentence, but yes, JSON logging is pretty basic and is a consistent option for various logging services (e.g., Graylog, cloudwatchlogs, sumologic)... which is why it is included. We could choose to have a separate JSONMementoFormatter.jl package, but that seemed a little verbose give that JSON is a minimal dependency and it is such a ubiquitous log format (vs the actual logging services).

If you'd like to add a TOML formatter that's fine, but it is currently a less common format for various logging services.

NOTE: The use case here is for logging services that can benefit from a structured log record format for indexing and quick searches, and not for human readability in a local file.

PallHaraldsson commented 4 years ago

I looked, and other package you're likely to use with, e.g. Genie and LibPQ (there an indirect dependency), use JSON.jl anyway, so not important to drop. [Assuming, as I believe, you pay the cost of of its startup only once.]

I wouldn't add TOML unless you can get rid of JSON, which doesn't seem high priority.

[Many PostgreSQL users will not use JSON, so some of the functionality, e.g. JSON could be lazy loaded, in effect, when not used weakening my above argument.]

[I rewrote my other comment for clarity.]

invenia / Memento.jl

Faster startup #163

Codecov Report