Open samth opened 12 years ago
An hour ago, samth@ccs.neu.edu wrote:
This program seems to loop forever. [...]
- It's not. If you'd try to minimize it, you'd see that it's just very slow.
(let loop ([i 0]
[h ">Release: unknown\n\n"]
[r ">Closed-Date:\\s+(.*)\n>Release:"])
(define px (pregexp r))
(printf "~a. " i)
(time (regexp-match px h))
(loop (add1 i)
(string-append ">Number: 105\n" h)
(string-append ">Number:\\s+(.*)\n" r)))
On Wed, Feb 29, 2012 at 2:44 PM, Eli Barzilay <eli@barzilay.org> wrote:
> An hour ago, samth@ccs.neu.edu wrote:
>> This program seems to loop forever.
>> [...]
>
> 1. It's not. If you'd try to minimize it, you'd see that it's just
> very slow.
>
> (let loop ([i 0]
> [h ">Release: unknown\n\n"]
> [r ">Closed-Date:\\s+(.*)\n>Release:"])
> (define px (pregexp r))
> (printf "~a. " i)
> (time (regexp-match px h))
> (loop (add1 i)
> (string-append ">Number: 105\n" h)
> (string-append ">Number:\\s+(.*)\n" r)))
Ah, I see. Thanks. But, ...
- This is some pathological case where it works hard to eventually fail (note the lack of `Closed-Date' in the string, which is in your example).
Why is it so slow on this? This is a fairly short string, it shouldn't take this long.
10 minutes ago, Sam Tobin-Hochstadt wrote:
On Wed, Feb 29, 2012 at 2:44 PM, Eli Barzilay eli@barzilay.org wrote:
- This is some pathological case where it works hard to eventually fail (note the lack of `Closed-Date' in the string, which is in your example).
Why is it so slow on this? This is a fairly short string, it shouldn't take this long.
(I think that it's the inefficient way in which .* fails, and each level multiplies that work.)
On Wed, Feb 29, 2012 at 3:07 PM, Eli Barzilay eli@barzilay.org wrote:
10 minutes ago, Sam Tobin-Hochstadt wrote:
On Wed, Feb 29, 2012 at 2:44 PM, Eli Barzilay eli@barzilay.org wrote:
- This is some pathological case where it works hard to eventually fail (note the lack of `Closed-Date' in the string, which is in your example).
Why is it so slow on this? This is a fairly short string, it shouldn't take this long.
(I think that it's the inefficient way in which .* fails, and each level multiplies that work.)
It looks like [^\n]* is still slow. Is there something I can write here that isn't quite so slow?
Two hours ago, Sam Tobin-Hochstadt wrote:
On Wed, Feb 29, 2012 at 3:07 PM, Eli Barzilay eli@barzilay.org wrote:
10 minutes ago, Sam Tobin-Hochstadt wrote:
On Wed, Feb 29, 2012 at 2:44 PM, Eli Barzilay eli@barzilay.org wrote:
- This is some pathological case where it works hard to eventually fail (note the lack of `Closed-Date' in the string, which is in your example).
Why is it so slow on this? This is a fairly short string, it shouldn't take this long.
(I think that it's the inefficient way in which .* fails, and each level multiplies that work.)
It looks like [^\n]* is still slow. Is there something I can write here that isn't quite so slow?
In some cases you can get reasonable performance using non-greedy operators.
I still don't see how a monolithic regexp is better than simply splitting the text into fields first (trivial in this format), then parse each field (easy too: each field is either some text on a single line, or multiline fields that follow a header line with just the field name):
(define (parse-gnats pr-text)
(map (λ (s)
(cdr (or (regexp-match (if (regexp-match? #rx"\n" s)
#rx"^([^:]+): *\n(.*)$"
#rx"^([^:]+): *([^\n]*)$")
s)
(error "Unexpected input:" s))))
(cdr (regexp-split #rx"\n>" pr-text))))
(If I add the extra keywords to `regexp-match*', then this could even be shorter.)
On 02/29/2012 01:14 PM, Sam Tobin-Hochstadt wrote:
On Wed, Feb 29, 2012 at 3:07 PM, Eli Barzilayeli@barzilay.org wrote:
10 minutes ago, Sam Tobin-Hochstadt wrote:
On Wed, Feb 29, 2012 at 2:44 PM, Eli Barzilayeli@barzilay.org wrote:
- This is some pathological case where it works hard to eventually fail (note the lack of `Closed-Date' in the string, which is in your example).
Why is it so slow on this? This is a fairly short string, it shouldn't take this long.
(I think that it's the inefficient way in which .* fails, and each level multiplies that work.)
It looks like [^\n]* is still slow. Is there something I can write here that isn't quite so slow?
Not a solution, but you might find parts of this article interesting:
http://swtch.com/~rsc/regexp/regexp1.html
Ryan
Originally submitted on: Wed Feb 29 13:40:01 -0500 2012
This program seems to loop forever.
Release:
5.2.1.7--2012-02-29(3267738/g)
Environment:
unix "Linux loki 3.0.0-15-generic #26-Ubuntu SMP Fri Jan 20 17:23:00 UTC 2012 x86_64 x86_64 x86_64 GNU/Linux" (x86_64-linux/3m) (get-display-depth) = 32 Human Language: english (current-memory-use) 427125968 Links: (links) = ("gcstat" "raco-git" "assignments" "var" "disassemble" "book" "class"); (links #:user? #f) = (); (links #:root? #t) = (); (links #:user? #f #:root? #t) = ()
Collections: ("/home/samth/sw/plt/add-on/5.2.1.7/collects" ("info-domain")) ("/home/samth/sw/plt/collects" ("datalog" "scriblib" "html" "racklog" "drracket" "sqr.png" "dynext" "honu" "2htdp" "graphics" "make" "eopl" "test-box-recovery" "hierlist" "racket" "rackunit" "openssl" "r5rs" "algol60" "mzcom" "redex" "texpict" "swindle" "defaults" "trace" "combinator-parser" "mysterx" "info-domain" "meta" "teachpack" "setup" "repo-time-stamp" "games" "r6rs" "trig.png" "icons" "typed-scheme" "test-engine" "lazy" "stepper" "at-exp" "planet" "tests" "rnrs" "db" "web-server" "framework" "net" "scribblings" "string-constants" "help" "browser" "tex2page" "mzlib" "parser-tools" "errortrace" "data" "sirmail" "plot" "launcher" "handin-client" "syntax" "profile" "s-exp" ".gitignore" "slideshow" "plai" "htdp" "typed-racket" "scheme" "syntax-color" "guibuilder" "drscheme" "raco" "srfi" "reader" "preprocessor" "compiler" "config" "mred" "handin-server" "schemeunit" "typed" "macro-debugger" "deinprogramm" "afm" "ffi" "gui-debugger" "readline" "scribble" "unstable" "picturing-programs" "file" "sgl" "im! ages" "wxme" "xrepl" "lang" "xml" "mrlib" "version" "frtime" "mzscheme" "embedded-gui" "slatex"))
Computer Language: (("Determine language from source") (#(#t print mixed-fraction-e #f #t test-coverage) (default) #() "#lang racket\n" #f #t))
This bug was converted from Gnats bug 12609.