Open samth opened 12 years ago
On Wed, 29 Feb 2012 14:44:23 -0500, eli@barzilay.org wrote:
An hour ago, samth@ccs.neu.edu wrote:
This program seems to loop forever. [...]
- It's not. If you'd try to minimize it, you'd see that it's just very slow.
(let loop ([i 0]
[h ">Release: unknown\n\n"]
[r ">Closed-Date:\s+(._)\n>Release:"])
(define px (pregexp r))
(printf "~a. " i)
(time (regexp-match px h))
(loop (add1 i)
(string-append ">Number: 105\n" h)
(string-append ">Number:\s+(._)\n" r)))
((lambda (x) (x x)) (lambda (x) (x x))) Eli Barzilay:
http://barzilay.org/ Maze is Life!
On Wed, 29 Feb 2012 14:53:26 -0500, samth@ccs.neu.edu wrote:
On Wed, Feb 29, 2012 at 2:44 PM, Eli Barzilay eli@barzilay.org wrote:
An hour ago, samth@ccs.neu.edu wrote:
This program seems to loop forever. [...]
- It's not. If you'd try to minimize it, you'd see that it's just very slow.
(let loop ([i 0] [h ">Release: unknown\n\n"] [r ">Closed-Date:\s+(.)\n>Release:"]) (define px (pregexp r)) (printf "~a. " i) (time (regexp-match px h)) (loop (add1 i) (string-append ">Number: 105\n" h) (string-append ">Number:\s+(.)\n" r)))
Ah, I see. Thanks. But, ...
- This is some pathological case where it works hard to eventually fail (note the lack of `Closed-Date' in the string, which is in your example).
Why is it so slow on this? This is a fairly short string, it
sam th samth@ccs.neu.edu
On Wed, 29 Feb 2012 15:07:24 -0500, eli@barzilay.org wrote:
10 minutes ago, Sam Tobin-Hochstadt wrote:
On Wed, Feb 29, 2012 at 2:44 PM, Eli Barzilay eli@barzilay.org wrote:
- This is some pathological case where it works hard to eventually fail (note the lack of `Closed-Date' in the string, which is in your example).
Why is it so slow on this? This is a fairly short string, it shouldn't take this long.
(I think that it's the inefficient way in which .* fails, and each level multiplies that work.)
((lambda (x) (x x)) (lambda (x) (x x))) Eli Barzilay:
http://barzilay.org/ Maze is Life!
On Wed, 29 Feb 2012 15:14:18 -0500, samth@ccs.neu.edu wrote:
On Wed, Feb 29, 2012 at 3:07 PM, Eli Barzilay eli@barzilay.org wrote:
10 minutes ago, Sam Tobin-Hochstadt wrote:
On Wed, Feb 29, 2012 at 2:44 PM, Eli Barzilay eli@barzilay.org wrote:
- This is some pathological case where it works hard to eventually fail (note the lack of `Closed-Date' in the string, which is in your example).
Why is it so slow on this? This is a fairly short string, it shouldn't take this long.
(I think that it's the inefficient way in which .* fails, and each level multiplies that work.)
It looks like [^\n]* is still slow. Is there something I can write
sam th samth@ccs.neu.edu
On Wed, 29 Feb 2012 17:18:06 -0500, eli@barzilay.org wrote:
Two hours ago, Sam Tobin-Hochstadt wrote:
On Wed, Feb 29, 2012 at 3:07 PM, Eli Barzilay eli@barzilay.org wrote:
10 minutes ago, Sam Tobin-Hochstadt wrote:
On Wed, Feb 29, 2012 at 2:44 PM, Eli Barzilay eli@barzilay.org wrote:
- This is some pathological case where it works hard to eventually fail (note the lack of `Closed-Date' in the string, which is in your example).
Why is it so slow on this? This is a fairly short string, it shouldn't take this long.
(I think that it's the inefficient way in which .* fails, and each level multiplies that work.)
It looks like [^\n]* is still slow. Is there something I can write here that isn't quite so slow?
In some cases you can get reasonable performance using non-greedy operators.
I still don't see how a monolithic regexp is better than simply splitting the text into fields first (trivial in this format), then parse each field (easy too: each field is either some text on a single line, or multiline fields that follow a header line with just the field name):
(define (parse-gnats pr-text) (map (λ (s) (cdr (or (regexp-match (if (regexp-match? #rx"\n" s)
#rx"^([^:]+): _([^\n]_)$")
s)
(error "Unexpected input:" s))))
(cdr (regexp-split #rx"\n>" pr-text))))
(If I add the extra keywords to `regexp-match*', then this could even be shorter.)
((lambda (x) (x x)) (lambda (x) (x x))) Eli Barzilay:
http://barzilay.org/ Maze is Life!
On Wed, 29 Feb 2012 16:28:41 -0700, ryan@cs.utah.edu wrote:
On 02/29/2012 01:14 PM, Sam Tobin-Hochstadt wrote:
On Wed, Feb 29, 2012 at 3:07 PM, Eli Barzilayeli@barzilay.org wrote:
10 minutes ago, Sam Tobin-Hochstadt wrote:
On Wed, Feb 29, 2012 at 2:44 PM, Eli Barzilayeli@barzilay.org wrote:
- This is some pathological case where it works hard to eventually fail (note the lack of `Closed-Date' in the string, which is in your example).
Why is it so slow on this? This is a fairly short string, it shouldn't take this long.
(I think that it's the inefficient way in which .* fails, and each level multiplies that work.)
It looks like [^\n]* is still slow. Is there something I can write here that isn't quite so slow?
Not a solution, but you might find parts of this article interesting:
http://swtch.com/~rsc/regexp/regexp1.html
Ryan
Originally submitted on: Wed Feb 29 13:40:01 -0500 2012
This program seems to loop forever.
lang racket
(define h ">Number: 105\n>Category: mcmicmac\n>Synopsis: language levels\n>Confidential: no\n>Severity: serious\n>Priority: medium\n>Responsible: shriram\n>State: closed\n>Class: sw-bug\n>Submitter-Id: unknown\n>Arrival-Date: Mon May 19 14:35:02 -0400 1997\n>Last-Modified: Wed Apr 28 11:02:13 -0400 2004\n>Originator: Mark W. Krentel\n>Release: unknown\n\n") (define r
px">Number:\s+(.)\n>Category:\s+(.)\n>Synopsis:\s+(.)\n>Confidential:\s+(.)\n>Severity:\s+(.)\n>Priority:\s+(.)\n>Responsible:\s+(.)\n>State:\s+(.)\n>Class:\s+(.)\n>Submitter-Id:\s+(.)\n>Arrival-Date:\s+(.)\n>Closed-Date:\s+(.)\n>Last-Modified:\s+(.)\n>Originator:\s+(.)\n>Release:")
(regexp-match r h)
Release:
5.2.1.7--2012-02-29(3267738/g)
Environment:
unix "Linux loki 3.0.0-15-generic #26-Ubuntu SMP Fri Jan 20 17:23:00 UTC 2012 x86_64 x86_64 x86_64 GNU/Linux" (x86_64-linux/3m) (get-display-depth) = 32 Human Language: english (current-memory-use) 427125968 Links: (links) = ("gcstat" "raco-git" "assignments" "var" "disassemble" "book" "class"); (links #:user? #f) = (); (links #:root? #t) = (); (links #:user? #f #:root? #t) = ()
Collections: ("/home/samth/sw/plt/add-on/5.2.1.7/collects" ("info-domain")) ("/home/samth/sw/plt/collects" ("datalog" "scriblib" "html" "racklog" "drracket" "sqr.png" "dynext" "honu" "2htdp" "graphics" "make" "eopl" "test-box-recovery" "hierlist" "racket" "rackunit" "openssl" "r5rs" "algol60" "mzcom" "redex" "texpict" "swindle" "defaults" "trace" "combinator-parser" "mysterx" "info-domain" "meta" "teachpack" "setup" "repo-time-stamp" "games" "r6rs" "trig.png" "icons" "typed-scheme" "test-engine" "lazy" "stepper" "at-exp" "planet" "tests" "rnrs" "db" "web-server" "framework" "net" "scribblings" "string-constants" "help" "browser" "tex2page" "mzlib" "parser-tools" "errortrace" "data" "sirmail" "plot" "launcher" "handin-client" "syntax" "profile" "s-exp" ".gitignore" "slideshow" "plai" "htdp" "typed-racket" "scheme" "syntax-color" "guibuilder" "drscheme" "raco" "srfi" "reader" "preprocessor" "compiler" "config" "mred" "handin-server" "schemeunit" "typed" "macro-debugger" "deinprogramm" "afm" "ffi" "gui-debugger" "readline" "scribble" "unstable" "picturing-programs" "file" "sgl" "im! ages" "wxme" "xrepl" "lang" "xml" "mrlib" "version" "frtime" "mzscheme" "embedded-gui" "slatex"))
Computer Language: (("Determine language from source") (#(#t print mixed-fraction-e #f #t test-coverage) (default) #() "#lang racket\n" #f #t))
This bug was converted from Gnats bug 12609.