Extremely slow repetitive regexing from the new QueryLogger

jkoudys commented 4 years ago

All coming from here: https://github.com/launchbadge/sqlx/commit/7bc3133f70b63cb177927b03c3c9b745d6aa5a29#diff-eb0cd75f4616762e6091c0588db07a75

I'm seeing an execution that runs many times (graphql resolver that resolves on thousands of its children) throttled at 100% . It spends all that time regexing through the SQL I've passed it in QueryLogger. This used to be quite fast as it would simply do a cache check on a prepared statement against the connection, and if it did, execute (or prepare a new one) and never needed to read into the query itself.

Here's a simplified callstack from lldb, where I've tried to reduce the noise and remove any internal-eyes-only sql:

 thread #3, name = 'actix-rt:worker:0'
    frame #0: 0x000000010ad819fa clausehound`regex::backtrack::usize_to_u32::hdf907051fdf3a146(n=4) at backtrack.rs:287:5
    frame #1: 0x000000010ad818da clausehound`regex::backtrack::Bounded$LT$I$GT$::has_visited::hf8af1293bf08fe70(self=0x000070000a51ecc0, ip=1225, at=InputAt @ 0x000070000a51e838) at backtrack.rs:273:18
    frame #2: 0x000000010ad806a5 clausehound`regex::backtrack::Bounded$LT$I$GT$::step::hbc37ff202a0f4eb5(self=0x000070000a51ecc0, ip=1225, at=InputAt @ 0x000070000a51eaa0) at backtrack.rs:205:16
    frame #3: 0x000000010ad80550 clausehound`regex::backtrack::Bounded$LT$I$GT$::backtrack::hee64c3ce72ad992c(self=0x000070000a51ecc0, start=<unavailable>) at backtrack.rs:178:24
    frame #4: 0x000000010ad7fb3a clausehound`regex::backtrack::Bounded$LT$I$GT$::exec_::h230596d2910ee547(self=0x000070000a51ecc0, at=InputAt @ 0x000070000a51ed00, end=241) at backtrack.rs:145:55
    frame #5: 0x000000010ad7f05a clausehound`regex::backtrack::Bounded$LT$I$GT$::exec::h473fc42d6ce15005(prog=0x00007fb009800420, cache=0x00007fb007f13f70, matches=&mut [bool] @ 0x000070000a51ed30, slots=(data_ptr = 0x00007fb007c10880, length = 6), input=CharInput @ 0x000070000a51ec88, start=0, end=241) at backtrack.rs:103:9
    frame #6: 0x000000010adaf3d8 clausehound`regex::exec::ExecNoSync::exec_backtrack::h97e8f52e94e02b47(self=0x000070000a529430, matches=&mut [bool] @ 0x000070000a51ee68, slots=(data_ptr = 0x00007fb007c10880, length = 6), text=(data_ptr = "XXXSQLXXX", length = 241), start=0, end=241) at exec.rs:1119:13
    frame #7: 0x000000010adaeea0 clausehound`regex::exec::ExecNoSync::exec_nfa::h0cfe7711f7d7040a(self=0x000070000a529430, ty=Backtrack, matches=&mut [bool] @ 0x000070000a51ef68, slots=(data_ptr = 0x00007fb007c10880, length = 6), quit_after_match=false, quit_after_match_with_pos=false, text=(data_ptr = "XXXSQLXXX", length = 241), start=0, end=241) at exec.rs:1060:13
    frame #8: 0x000000010adaeb3a clausehound`regex::exec::ExecNoSync::captures_nfa_type::ha869baee80db827c(self=0x000070000a529430, ty=Auto, slots=(data_ptr = 0x00007fb007c10880, length = 6), text=(data_ptr = "XXXSQLXXX, length = 241), start=0, end=241) at exec.rs:1009:12

## ..... a bunch more Regex::captures

    frame #15: 0x000000010a5eaf83 clausehound`sqlformat::tokenizer::get_plain_reserved_token::he762e95704837ba6(input=(data_ptr = "XXXSQLXXX, length = 241)) at tokenizer.rs:212:5
    frame #16: 0x000000010a5eae02 clausehound`sqlformat::tokenizer::get_reserved_word_token::_$u7b$$u7b$closure$u7d$$u7d$::h37054d6b1e5ddfaf at tokenizer.rs:188:21
    frame #17: 0x000000010a607714 clausehound`core::option::Option$LT$T$GT$::or_else::h134d84f62f07b282(self=Option<sqlformat::tokenizer::Token> @ 0x000070000a529838, f=closure-2 @ 0x000070000a5297d8) at option.rs:766:21
    frame #18: 0x000000010a5ead77 clausehound`sqlformat::tokenizer::get_reserved_word_token::hf9734b5ed74bca49(input=(data_ptr = "XXXXXSQL", length = 241), previous_token=Option<&sqlformat::tokenizer::Token> @ 0x000070000a529830) at tokenizer.rs:185:5
    frame #19: 0x000000010a5e9c2c clausehound`sqlformat::tokenizer::get_next_token::_$u7b$$u7b$closure$u7d$$u7d$::hd7a15da3554e8455 at tokenizer.rs:79:21
    frame #20: 0x000000010a607fe0 clausehound`core::option::Option$LT$T$GT$::or_else::hd328e8464ce36f66(self=Option<sqlformat::tokenizer::Token> @ 0x000070000a529a88, f=closure-6 @ 0x000070000a529980) at option.rs:766:21
    frame #21: 0x000000010a5e9a3b clausehound`sqlformat::tokenizer::get_next_token::heea5162f560d3da9(input=(data_ptr = "XXXXXSQL", length = 241), previous_token=Option<&sqlformat::tokenizer::Token> @ 0x000070000a5299d8) at tokenizer.rs:72:5
    frame #22: 0x000000010a5e9558 clausehound`sqlformat::tokenizer::tokenize::haf06a858d053e60b(input=(data_ptr = "XXXXXSQL", length = 241)) at tokenizer.rs:12:22
    frame #23: 0x000000010a609d3e clausehound`sqlformat::format::h4a39c243e774c1e6(query=(data_ptr = "XXXXXSQL, length = 311), params=0x000000010b017050, options=(indent = sqlformat::Indent @ 0x000070000a529e80, uppercase = false, lines_between_queries = '\x01')) at lib.rs:15:18
    frame #24: 0x000000010a5672eb clausehound`sqlx_core::logger::QueryLogger::finish::hd9f9434d6a799964(self=0x00007fb009093fe0) at logger.rs:41:21
    frame #25: 0x000000010a567761 clausehound`_$LT$sqlx_core..logger..QueryLogger$u20$as$u20$core..ops..drop..Drop$GT$::drop::he09628e261f4cd86(self=0x00007fb009093fe0) at logger.rs:70:9
    frame #26: 0x000000010a5cb2c1 clausehound`core::ptr::drop_in_place::h0ec659c352390567((null)=0x00007fb009093fe0) at mod.rs:177:1
    frame #27: 0x000000010a5cd699 clausehound`core::ptr::drop_in_place::h46c9c821341507f9 at async_stream.rs:71:9
    frame #28: 0x000000010a5caff1 clausehound`core::ptr::drop_in_place::h053074e64eb2d435((null)=0x00007fb009093f68) at mod.rs:177:1
    frame #29: 0x000000010a5cf5a9 clausehound`core::ptr::drop_in_place::h8b42d23ca82add9a at async_stream.rs:28:33
    frame #30: 0x000000010a5d24a1 clausehound`core::ptr::drop_in_place::hdab8fe07eb2a2c7e((null)=0x00007fb009093c00) at mod.rs:177:1
    frame #31: 0x000000010a5cec01 clausehound`core::ptr::drop_in_place::h6e6ced6195372bbc((null)=0x00007fb009093c00) at mod.rs:177:1
    frame #32: 0x000000010a5ce9b1 clausehound`core::ptr::drop_in_place::h6cb48ac7c1601a22((null)=0x00007fb009093c00) at mod.rs:177:1
    frame #33: 0x000000010a5ce39f clausehound`core::ptr::drop_in_place::h5f6a3dfe4f33b527((null)=0x00007fb007d26858) at mod.rs:177:1
    frame #34: 0x000000010a5d2b91 clausehound`core::ptr::drop_in_place::hecf2308d5edf9470((null)=0x00007fb007d26858) at mod.rs:177:1
    frame #35: 0x000000010a5d334e clausehound`core::ptr::drop_in_place::hf642a7d0cc209dc4((null)=0x00007fb007d26850) at mod.rs:177:1
    frame #36: 0x000000010a5cc30b clausehound`core::ptr::drop_in_place::h29b1a0c6f7831645((null)=0x00007fb00908b9e8) at mod.rs:177:1
    frame #37: 0x000000010a5ce9d1 clausehound`core::ptr::drop_in_place::h6cbe51a0d5d8a1aa((null)=0x00007fb00908b9e8) at mod.rs:177:1
    frame #38: 0x000000010a11487b clausehound`core::ptr::drop_in_place::hc85beac201e8602b at async_stream.rs:71:9
    frame #39: 0x000000010a1041e1 clausehound`core::ptr::drop_in_place::h88e0e36d05fd5962((null)=0x00007fb00908b970) at mod.rs:177:1
    frame #40: 0x000000010a121399 clausehound`core::ptr::drop_in_place::hfbcede085ec486cc at async_stream.rs:28:44
    frame #41: 0x000000010a106f01 clausehound`core::ptr::drop_in_place::h94899048937e44d3((null)=0x00007fb00908b600) at mod.rs:177:1
    frame #42: 0x000000010a121dd1 clausehound`core::ptr::drop_in_place::hfd19355a22e27521((null)=0x00007fb00908b600) at mod.rs:177:1
    frame #43: 0x000000010a0fa081 clausehound`core::ptr::drop_in_place::h662773a8b5a00cd7((null)=0x00007fb00908b600) at mod.rs:177:1
    frame #44: 0x000000010a5ce39f clausehound`core::ptr::drop_in_place::h5f6a3dfe4f33b527((null)=0x00007fb007d148f8) at mod.rs:177:1
    frame #45: 0x000000010a5d2b91 clausehound`core::ptr::drop_in_place::hecf2308d5edf9470((null)=0x00007fb007d148f8) at mod.rs:177:1
    frame #46: 0x000000010a5d334e clausehound`core::ptr::drop_in_place::hf642a7d0cc209dc4((null)=0x00007fb007d148f0) at mod.rs:177:1
    frame #47: 0x000000010a5d14bf clausehound`core::ptr::drop_in_place::hc85774b31faf7854((null)=0x00007fb007d1d1e0) at mod.rs:177:1
    frame #48: 0x000000010a5cf0c1 clausehound`core::ptr::drop_in_place::h7a4a3bfda5533806((null)=0x00007fb007d1d1e0) at mod.rs:177:1
    frame #49: 0x000000010a1d65ed clausehound`sqlx_core::mysql::connection::executor::_$LT$impl$u20$sqlx_core..executor..Executor$u20$for$u20$$RF$mut$u20$sqlx_core..mysql..connection..MySqlConnection$GT$::fetch_optional::_$u7b$$u7b$closure$u7d$$u7d$::h803ff86c56fe07fe((null)=ResumeTy @ 0x000070000a52e7b8) at executor.rs:255:9

## a bunch of futures-polling frames...

### finally, my function
    frame #54: 0x0000000109e7a150 clausehound`_$LT$clausehound..models..framework..Framework$u20$as$u20$clausehound..models..base..Fetch$GT$::fetch::__fetch::_$u7b$$u7b$closure$u7d$$u7d$::hd4129b2b7f99f3a9((null)=ResumeTy @ 0x000070000a52fe60) at framework.rs:15:24

Being able to turn off that QueryLogger and not pay the hard cost of its ::new would be great, but at very least caching that SQL string into a hashmap would speed things up a ton so it wouldn't have to regex it literally millions of times.

Hitting this using the mysql driver, on 0.4.0-beta.1

jkoudys commented 4 years ago

For anyone hitting this, you can work around it easily by adjusting the RUST_LOG setting for this crate to error or lower. I'd still like to get warnings though.

I wonder now if SQL parsing for whitespace is too heavy an action to do on every Drop in the execute. Personally I'd be fine leaving it off, especially since an unformatted sql string might give me some hints useful for debugging, and I'd rather pay the cost of formatting eg on my laptop when I'm reading the log, than at runtime

abonander commented 4 years ago

cc @shssoichiro (for sqlformat perf)

shssoichiro commented 4 years ago

Acknowledging that my eyes are on this. Still looking into what could be causing this, it seems out of the ordinary.

shssoichiro commented 4 years ago

@jkoudys Would you happen to have an example of a query that is triggering this behavior? I'm running through the benchmarks that exist in sqlformat, and the existing ones perform well. However, if I benchmark the example from #651, which is significantly long (insert 10000 rows), the time increase does seem somewhat exponential. So, I am trying to determine if there is some commonality between the queries, or if it is only the length of the query that is causing issue.

jkoudys commented 4 years ago

@shssoichiro I sanitized it for security reasons, but I think this one is probably okay:

SELECT\n  d.uuid AS uuid,\n\td.name_of_document AS name,\n\td.slug_name AS slug,\n\td.default_contract_uuid AS default_contract_uuid,\n\ta.uuid AS parent_uuid,\n\ta.name_of_agreement AS agreement_name,\n\td.icon_name AS icon\nFROM `documents` d\nLEFT JOIN agreements a ON a.uuid = d.parent_uuid\n WHERE d.uuid = ? LIMIT 1\

So our testing was hitting a lot of extreme loads, because it was going through graphql resolvers (async_graphql at the time, since moved to juniper), where we'd hit N+1 problems as we hadn't yet put in a dataloader. e.g. before if you had a gql query organization(id: 123) { users { uuid name } }, if there's 5000 users in that organization, it'll QueryLog 5000 times. High for our test case, but still well within a reasonable number of queries to see every minute on a live site.

From your description, it does sound related to the number of queries run, vs the length of the query itself. Contention for a shared resource, perhaps?

jkoudys commented 4 years ago

btw we've turned off the logging for sqlx right now, so it's not killing us, but the logging is nice and obviously has had a lot of work put into it, so I'd definitely want to turn it on for debugging some things soon.

c1wren commented 4 years ago

I've run into this issue as well. @jkoudys how did y'all turn off logging for just sqlx?

jkoudys commented 4 years ago

@c1wren see https://rust-lang-nursery.github.io/rust-cookbook/development_tools/debugging/config_log.html

You can do it in code, though we use the env var because it's easier to config staging and prod envs that way. Anything more severe than a WARN will suppress it entirely. Personally I'd like to have warn logging on in production, because the same best-practices that prevent sql-injection also keep confidential info out of the prepared statements it logs, but if I'm getting a warning because the queries are going slowly, the slow regexing from this compounds with that and turns a few seconds of delay into a server hang.

c1wren commented 4 years ago

@jkoudys I spent some extra time digging into the problem and due to how the logging was implemented here, you can't have ANY of the log levels for any part be less than error or the slow part of the code will still get run even though it isn't getting logged.

https://github.com/launchbadge/sqlx/blob/2e1658e08b053f66102b5beabc7fdd4ac28e3a48/sqlx-core/src/logger.rs#L34 controls if the super slow code gets run BUT

https://github.com/launchbadge/sqlx/blob/2e1658e08b053f66102b5beabc7fdd4ac28e3a48/sqlx-core/src/logger.rs#L60 controls if it actually gets logged though, which is a design flaw.

So it is possible to have it not logged but the slow code will still get run.

jkoudys commented 4 years ago

Ah I see. I was using that line 34 to simply set the max level above warn, but yeah I see the ::format happens regardless of the log level, so long as the max level is at least warning.

shssoichiro commented 4 years ago

https://github.com/shssoichiro/sqlformat-rs/pull/2 improves average case performance by 50%, including for this query. The query in #651 improves by 17% but still has some other slowness issue occurring. I'll continue to investigate that and release a patch version of sqlformat after finding a fix for that query.

shssoichiro commented 4 years ago

I think in order to get any more significant gains, I'll have to rewrite sqlformat's tokenizer to use nom instead of regex. So I'll publish a patch version for now with the gains that are in place.

shssoichiro commented 4 years ago

With the release of sqlformat 0.1.2, the runtime for this pathological query has been reduced by 90%, from 100ms to 10ms. Runtime for all queries has improved across the board by about 25% as well.

SomeoneToIgnore commented 3 years ago

Current latest sqlx version (0.4.2) ships sqlformat 0.1.5 (which presumably includes the perf improvements from 0.1.2), but I still observe a very slow query logging.

For query insert into test_table(a) values (?), (?), ...1000 more times (?) it takes around 734ms to finish the query.execute execution with the logging on. If I disable the logging, the speed drops down to 5ms for the same query.

Here's a trace for the slow case:

sqlx_trace

shssoichiro commented 3 years ago

I wrote up a new benchmark for the query you mentioned, @SomeoneToIgnore, and got consistent results of 3ms to format the query. It's interesting that you're seeing such different results, and unfortunately I don't have an explanation off the top of my head.

SomeoneToIgnore commented 3 years ago

https://github.com/SomeoneToIgnore/sqlx-large-queries-print

Here's my repro on the freshmost sqlx.

shssoichiro commented 3 years ago

Interesting, I ran the repo you posted and although I do see about 3-5x performance improvements, it's certainly not the 100x difference mentioned. I'm getting around 13-15ms per query with logging enabled and 3-5ms per query with logging disabled. It would seem really odd for this to be caused by some sort of machine difference, since sqlformat doesn't have to hit the disk or anything of that sort...

SomeoneToIgnore commented 3 years ago

My bad, I've used debug build and did not mention it.

Rechecked, on my machine, release with query logging produces 16-20ms per insert for 1000 items, but starts to grow when I insert more:

1 column,

1000 items: 16-20ms
2000 items: 48-50ms
3000 items: 96-106ms

3000 items with 1 column are inserted in 5-8ms for me when I disabled the sql logging.

If I increase the number of columns to 3 and rerun the release built tests, it gets way worse:

3 columns,

1000 items: ~70ms
2000 items: ~250ms
3000 items: ~520ms

3000 items with 3 columns are inserted in ~10ms (with the first query being slow, ~30ms) for me when I disabled the sql logging.

===

IMO there's something fishy here still, you just have to be curious a bit and fiddle with the query parameter numbers to receive peculiar performance results.

holicc commented 1 year ago

Are there any processes here? I also hit into this problem.

darksidecat commented 1 year ago

QueryLogger uses a sqlformat crate, and his tokenizer has O(n²) complicity 😬

https://github.com/shssoichiro/sqlformat-rs/blob/66bdb0fadf9e750600df0c778cff88e45277d7c1/src/tokenizer.rs#L13

With large requests, for example a batch insert of 2000 lines with 6 values, with the query logging enabled, the process takes 13 seconds, when without logging it is milliseconds. the same batch of 500 lines with logging takes less than a second, so dependence on time is non-linear

https://github.com/launchbadge/sqlx/blob/12b4250454b13fa2699dee9a4c761154ae60ddb6/sqlx-core/src/logger.rs#L38-L46C17

After changing

format!(
    "\n\n{}\n",
     sqlformat::format(
         &self.sql,
         &sqlformat::QueryParams::None,
         sqlformat::FormatOptions::default()
          )
)

to

format!(
    "\n\n{}\n",
      &self.sql,
)

there is no more performance problems.

maybe it is worth using a more effective tokenizer/formatter?

launchbadge / sqlx

Extremely slow repetitive regexing from the new QueryLogger #633