jtdaugherty / vty

A high-level ncurses alternative written in Haskell
BSD 3-Clause "New" or "Revised" License
319 stars 57 forks source link

Large bracketed pastes are very slow (seems `O(n^2)`) #231

Closed iphydf closed 2 years ago

iphydf commented 2 years ago

https://github.com/jtdaugherty/vty/blob/b3504fbcb0430a8ef8bc5cf254dbcf01d390a51a/src/Graphics/Vty/Input/Paste.hs#L35

This String-based code is probably quite inefficient for large pastes, but besides that, processing pastes is O(n^2).

Very rough timing (by counting seconds with a stopwatch, because I didn't set up a proper timing environment yet), raw data below the graph:

image

Compared to stty -echo; wc -c, which would roughly measure the terminal's own throughput, it takes 5 seconds for 13.5MB (10 seconds over 2 nested ssh connections to a remote server). I wouldn't expect a real application to be even close to these numbers, but I think we can do better than polynomial time, and maybe approach something where pasting images into a terminal becomes feasible. My use case is a TUI chat client where I'd like to be able to upload images via bracketed paste. This works today for very small images, or if you're very patient also for larger ones, but I'd like to make it a real thing.

Raw data:

jtdaugherty commented 2 years ago

Thank you for reporting this and for also providing measurements! Is this something you are interested in hacking on?

iphydf commented 2 years ago

Potentially yes. I'm not sure how much time I'll have in the next few days (I'm probably shelving this project for a bit because I need to work on something else first), but I'll get back to it at some point. From initial investigation it seems we'd save an order of magnitude in time just by using Text or ByteString earlier in the pipeline. I haven't actually looked at the algorithm I linked to yet, so I'm not sure that's the polynomial one. I'll look at it when I have headspace :).

iphydf commented 2 years ago

So.. actually looking at the algorithm for more than 5 seconds, I blame this line: https://github.com/jtdaugherty/vty/blob/b3504fbcb0430a8ef8bc5cf254dbcf01d390a51a/src/Graphics/Vty/Input/Paste.hs#L47

In particular, the non-match of that line. length is O(n) and runs on every recursive call.

iphydf commented 2 years ago

Never mind that. I'll look into it :). I've got my setup done now.

iphydf commented 2 years ago

First hacky experiment: not doing the takeUntil thing brings down the time for 300KB from 93 seconds to around 4-5 seconds.

iphydf commented 2 years ago

I'm done. Just some more testing then I'll make a PR.

jtdaugherty commented 2 years ago

Fantastic, thank you!

iphydf commented 2 years ago

Like I thought, it's about an order of magnitude speedup. Actually all my timings are wrong, because the new timings are in unoptimised mode while the old ones are optimised. Speedup is probably significantly better than I measured.

iphydf commented 2 years ago

In fact, turns out, it's not much faster in optimised mode, which is mildly surprising to me. I'll investigate more. The timings in the PR are roughly correct then.

iphydf commented 2 years ago

Indeed, the next thing to fix is bracketedPasteFinished being called on increasingly large strings => polynomial time. That fix isn't quite as trivial though. I'll have a look at what's the best way to fix it.

iphydf commented 2 years ago

This is impossible to fix without breaking the API. All of Graphics.Vty.Input.Loop is part of the public API. parseEvent in there concatenates unprocessed input until the whole input can be parsed. That kind of works for small inputs (it's still O(n^2)) but breaks down for large inputs.

What we should do instead is change the classifier to have a parse state that you can incrementally feed input until it succeeds. This requires larger changes to APIs, in particular: KClass currently has Prefix, but then it re-parses every time more input becomes available. All that needs to change to keep track of the parse state.

jtdaugherty commented 2 years ago

This is impossible to fix without breaking the API. All of Graphics.Vty.Input.Loop is part of the public API.

I am okay with taking this risk; we won't know what the breakage is until after the hacking is finished, and I think it's worth exploring an alternative approach and then assessing the breakage risk later. I suspect most of what's in that module actually never gets used directly by applications in practice, so I suspect that the risk of breakages having a negative impact is probably low.

iphydf commented 2 years ago

So far I didn't break brick :). I didn't do the parser thing yet, but changed all the internal string processing to ByteString and got some significant speedups already. It doesn't help with the polynomial time, but the constant factor is a lot smaller now. I'll do some timing comparisons.

jtdaugherty commented 2 years ago

That's great news - thanks again for looking into this.