Open jkoudys opened 4 years ago
For anyone hitting this, you can work around it easily by adjusting the RUST_LOG
setting for this crate to error or lower. I'd still like to get warnings though.
I wonder now if SQL parsing for whitespace is too heavy an action to do on every Drop in the execute. Personally I'd be fine leaving it off, especially since an unformatted sql string might give me some hints useful for debugging, and I'd rather pay the cost of formatting eg on my laptop when I'm reading the log, than at runtime
cc @shssoichiro (for sqlformat
perf)
Acknowledging that my eyes are on this. Still looking into what could be causing this, it seems out of the ordinary.
@jkoudys Would you happen to have an example of a query that is triggering this behavior? I'm running through the benchmarks that exist in sqlformat, and the existing ones perform well. However, if I benchmark the example from #651, which is significantly long (insert 10000 rows), the time increase does seem somewhat exponential. So, I am trying to determine if there is some commonality between the queries, or if it is only the length of the query that is causing issue.
@shssoichiro I sanitized it for security reasons, but I think this one is probably okay:
SELECT\n d.uuid AS uuid,\n\td.name_of_document AS name,\n\td.slug_name AS slug,\n\td.default_contract_uuid AS default_contract_uuid,\n\ta.uuid AS parent_uuid,\n\ta.name_of_agreement AS agreement_name,\n\td.icon_name AS icon\nFROM `documents` d\nLEFT JOIN agreements a ON a.uuid = d.parent_uuid\n WHERE d.uuid = ? LIMIT 1\
So our testing was hitting a lot of extreme loads, because it was going through graphql resolvers (async_graphql at the time, since moved to juniper), where we'd hit N+1 problems as we hadn't yet put in a dataloader. e.g. before if you had a gql query organization(id: 123) { users { uuid name } }
, if there's 5000 users in that organization, it'll QueryLog 5000 times. High for our test case, but still well within a reasonable number of queries to see every minute on a live site.
From your description, it does sound related to the number of queries run, vs the length of the query itself. Contention for a shared resource, perhaps?
btw we've turned off the logging for sqlx right now, so it's not killing us, but the logging is nice and obviously has had a lot of work put into it, so I'd definitely want to turn it on for debugging some things soon.
I've run into this issue as well. @jkoudys how did y'all turn off logging for just sqlx?
@c1wren see https://rust-lang-nursery.github.io/rust-cookbook/development_tools/debugging/config_log.html
You can do it in code, though we use the env var because it's easier to config staging and prod envs that way. Anything more severe than a WARN
will suppress it entirely. Personally I'd like to have warn logging on in production, because the same best-practices that prevent sql-injection also keep confidential info out of the prepared statements it logs, but if I'm getting a warning because the queries are going slowly, the slow regexing from this compounds with that and turns a few seconds of delay into a server hang.
@jkoudys I spent some extra time digging into the problem and due to how the logging was implemented here, you can't have ANY of the log levels for any part be less than error or the slow part of the code will still get run even though it isn't getting logged.
https://github.com/launchbadge/sqlx/blob/2e1658e08b053f66102b5beabc7fdd4ac28e3a48/sqlx-core/src/logger.rs#L34 controls if the super slow code gets run BUT
https://github.com/launchbadge/sqlx/blob/2e1658e08b053f66102b5beabc7fdd4ac28e3a48/sqlx-core/src/logger.rs#L60 controls if it actually gets logged though, which is a design flaw.
So it is possible to have it not logged but the slow code will still get run.
Ah I see. I was using that line 34 to simply set the max level above warn, but yeah I see the ::format
happens regardless of the log level, so long as the max level is at least warning.
https://github.com/shssoichiro/sqlformat-rs/pull/2 improves average case performance by 50%, including for this query. The query in #651 improves by 17% but still has some other slowness issue occurring. I'll continue to investigate that and release a patch version of sqlformat after finding a fix for that query.
I think in order to get any more significant gains, I'll have to rewrite sqlformat's tokenizer to use nom instead of regex. So I'll publish a patch version for now with the gains that are in place.
With the release of sqlformat 0.1.2, the runtime for this pathological query has been reduced by 90%, from 100ms to 10ms. Runtime for all queries has improved across the board by about 25% as well.
Current latest sqlx version (0.4.2) ships sqlformat 0.1.5 (which presumably includes the perf improvements from 0.1.2), but I still observe a very slow query logging.
For query
insert into test_table(a) values (?), (?), ...1000 more times (?)
it takes around 734ms
to finish the query.execute
execution with the logging on.
If I disable the logging, the speed drops down to 5ms
for the same query.
Here's a trace for the slow case:
I wrote up a new benchmark for the query you mentioned, @SomeoneToIgnore, and got consistent results of 3ms to format the query. It's interesting that you're seeing such different results, and unfortunately I don't have an explanation off the top of my head.
https://github.com/SomeoneToIgnore/sqlx-large-queries-print
Here's my repro on the freshmost sqlx.
Interesting, I ran the repo you posted and although I do see about 3-5x performance improvements, it's certainly not the 100x difference mentioned. I'm getting around 13-15ms per query with logging enabled and 3-5ms per query with logging disabled. It would seem really odd for this to be caused by some sort of machine difference, since sqlformat doesn't have to hit the disk or anything of that sort...
My bad, I've used debug
build and did not mention it.
Rechecked, on my machine, release
with query logging produces 16-20ms per insert for 1000 items, but starts to grow when I insert more:
1 column,
1000 items: 16-20ms
2000 items: 48-50ms
3000 items: 96-106ms
3000 items with 1 column are inserted in 5-8ms
for me when I disabled the sql logging.
If I increase the number of columns to 3 and rerun the release
built tests, it gets way worse:
3 columns,
1000 items: ~70ms
2000 items: ~250ms
3000 items: ~520ms
3000 items with 3 columns are inserted in ~10ms
(with the first query being slow, ~30ms) for me when I disabled the sql logging.
===
IMO there's something fishy here still, you just have to be curious a bit and fiddle with the query parameter numbers to receive peculiar performance results.
Are there any processes here? I also hit into this problem.
QueryLogger uses a sqlformat crate, and his tokenizer has O(n²) complicity 😬
With large requests, for example a batch insert of 2000 lines with 6 values, with the query logging enabled, the process takes 13 seconds, when without logging it is milliseconds. the same batch of 500 lines with logging takes less than a second, so dependence on time is non-linear
After changing
format!(
"\n\n{}\n",
sqlformat::format(
&self.sql,
&sqlformat::QueryParams::None,
sqlformat::FormatOptions::default()
)
)
to
format!(
"\n\n{}\n",
&self.sql,
)
there is no more performance problems.
maybe it is worth using a more effective tokenizer/formatter?
All coming from here: https://github.com/launchbadge/sqlx/commit/7bc3133f70b63cb177927b03c3c9b745d6aa5a29#diff-eb0cd75f4616762e6091c0588db07a75
I'm seeing an execution that runs many times (graphql resolver that resolves on thousands of its children) throttled at 100% . It spends all that time regexing through the SQL I've passed it in QueryLogger. This used to be quite fast as it would simply do a cache check on a prepared statement against the connection, and if it did, execute (or prepare a new one) and never needed to read into the query itself.
Here's a simplified callstack from lldb, where I've tried to reduce the noise and remove any internal-eyes-only sql:
Being able to turn off that QueryLogger and not pay the hard cost of its ::new would be great, but at very least caching that SQL string into a hashmap would speed things up a ton so it wouldn't have to regex it literally millions of times.
Hitting this using the mysql driver, on 0.4.0-beta.1