Closed wnederhof closed 8 years ago
Hmmm. need more details. It has nothing to do with backtracking (Rascal uses a form of GLL parsing which does not backtrack, but rather predicts all alternatives "in parallel" and shares common derivation paths).
The difference between the console and an import is interesting. Both scenarios use the same parsing technique. It may be as simple as running out of memory in one scenario and not in the other; the wait time could be generated by the GC running like crazy?
One thing I found is that during debug mode, the function liftRec
in the ASTBuilder
is called recursively uncountable times by itself. If you like, I can send you the file I was trying to load per e-mail?
yes please do send it! liftRec
is a recursive function over concrete syntax patterns, so it's normal to be called a lot in files with a lot of patterns. In any case it's not the parser that's bothering you then, since this is done some time after parsing.
In general, debug mode is slower. So for speed always use release mode (we have some commits in /unstable/ that do improve performance of debug mode)
Next to sending the file, have you tried increasing the memory (-Xmx
flag)? Java will fight very long before admitting it needs more memory. One of the signs is that your java process is suddenly consuming all your cores. (Rascal is quite single threaded for now)
I set the -Xmx
flag in eclipse.ini
to 2G and also in the Run settings for the Eclipse Application (as I run Rascal from source). Usually I do not start Rascal from Debug mode. top
shows me that Java is using around 100-105% of my CPU (I assume this means it uses only one full core and a bit). Memory is around 40%, but is at 30% when idle. My system (Linux Mint 17.2) does not freeze up when importing the module, which it sometimes does when a program uses all of the available memory. Perhaps irrelevant, but I have a Lenovo X220 i7-2640M 2.8GHZ (dual core) with 4GB memory and 4GB of swap memory on my SSD. Hope this helps! :-)
What is the grammar?
layout Whitespace = [\t-\n\r\ ]*;
lexical Id = [a-zA-Z][0-9a-zA-Z_]*;
lexical Wsp = [\ \n\r];
lexical Comment = ("//" [^\n\r]*) | ("/*" [^*/] "*/") ;
lexical WS_ = (Wsp | Comment)*;
lexical WS__ = (Wsp | Comment)+;
lexical Id = [a-zA-Z][0-9a-zA-Z_]*;
lexical ShortId = [a-zA-Z];
lexical LongId = Id;
syntax AndExp = and: AndExp WS_ "&" WS_ NotExp ;
syntax AndExp = notexp: NotExp ;
syntax Digit = eight: "8" ;
syntax Digit = five: "5" ;
syntax Digit = four: "4" ;
syntax Digit = nine: "9" ;
syntax Digit = \one: "1" ;
syntax Digit = seven: "7" ;
syntax Digit = six: "6" ;
syntax Digit = three: "3" ;
syntax Digit = two: "2" ;
syntax Digit = zero: "0" ;
syntax Exp = letexp: WS_ LetExp WS_ ;
syntax FacNoPar = app1: FacNoPar WS__ Prim ;
syntax FacNoPar = app2: FacParen WS_ Prim ;
syntax FacNoPar = car: "car" WS__ Prim ;
syntax FacNoPar = cdr: "cdr" WS__ Prim ;
syntax FacNoPar = isbool: "boolean?" WS__ Prim ;
syntax FacNoPar = isint: "integer?" WS__ Prim ;
syntax FacNoPar = isneg: "negative?" WS__ Prim ;
syntax FacNoPar = isnull: "null?" WS__ Prim ;
syntax FacNoPar = ispair: "pair?" WS__ Prim ;
syntax FacNoPar = ispos: "positive?" WS__ Prim ;
syntax FacNoPar = isproc: "procedure?" WS__ Prim ;
syntax FacNoPar = iszero: "zero?" WS__ Prim ;
syntax FacNoPar = pred: "--" WS_ Prim ;
syntax FacNoPar = prim: Prim ;
syntax FacNoPar = succ: "++" WS_ Prim ;
syntax FacParen = app: Factor WS_ "(" WS_ LetExp WS_ ")" ;
syntax FacParen = car: "car" WS_ "(" WS_ LetExp WS_ ")" ;
syntax FacParen = cdr: "cdr" WS_ "(" WS_ LetExp WS_ ")" ;
syntax FacParen = cons: "cons" WS_ "(" WS_ LetExp WS_ "," WS_ LetExp WS_ ")" ;
syntax FacParen = isbool: "boolean?" WS_ "(" WS_ LetExp WS_ ")" ;
syntax FacParen = isint: "integer?" WS_ "(" WS_ LetExp WS_ ")" ;
syntax FacParen = isneg: "negative?" WS_ "(" WS_ LetExp WS_ ")" ;
syntax FacParen = isnull: "null?" WS_ "(" WS_ LetExp WS_ ")" ;
syntax FacParen = ispair: "pair?" WS_ "(" WS_ LetExp WS_ ")" ;
syntax FacParen = ispos: "positive?" WS_ "(" WS_ LetExp WS_ ")" ;
syntax FacParen = isproc: "procedure?" WS_ "(" WS_ LetExp WS_ ")" ;
syntax FacParen = iszero: "zero?" WS_ "(" WS_ LetExp WS_ ")" ;
syntax FacParen = paren: "(" WS_ LetExp WS_ ")" ;
syntax FacParen = pred: "--" WS_ "(" WS_ LetExp WS_ ")" ;
syntax FacParen = succ: "++" WS_ "(" WS_ LetExp WS_ ")" ;
syntax Factor = facparen: FacParen ;
syntax Factor = noparen: FacNoPar ;
syntax IntConst = digit: Digit ;
syntax IntConst = more: IntConst Digit ;
syntax LetExp = \if: OrExp WS_ "?" WS_ LetExp WS_ ":" WS_ LetExp ;
syntax LetExp = letfun: "let" WS__ Id WS_ "(" WS_ Id WS_ ")" WS_ "=" WS_ LetExp WS_ "in" WS_ LetExp ;
syntax LetExp = letrec: "letrec" WS__ Id WS_ "(" WS_ Id WS_ ")" WS_ "=" WS_ LetExp WS_ "in" WS_ LetExp ;
syntax LetExp = letvar: "let" WS__ Id WS_ "=" WS_ LetExp WS_ "in" WS_ LetExp ;
syntax LetExp = orexp: OrExp ;
syntax NotExp = not: "!" WS_ RelExp ;
syntax NotExp = relexp: RelExp ;
syntax OrExp = andexp: AndExp ;
syntax OrExp = or: OrExp WS_ "|" WS_ AndExp ;
syntax OrExp = xor: OrExp WS_ "^" WS_ AndExp ;
syntax Prim = \false: "#f" ;
syntax Prim = intConst: IntConst ;
syntax Prim = null: "#null" ;
syntax Prim = \true: "#t" ;
syntax Prim = var: Id ;
syntax RelExp = eq: SimpleExp WS_ "=" WS_ SimpleExp ;
syntax RelExp = gt: SimpleExp WS_ "\>" WS_ SimpleExp ;
syntax RelExp = gte: SimpleExp WS_ "\>=" WS_ SimpleExp ;
syntax RelExp = lt: SimpleExp WS_ "\<" WS_ SimpleExp ;
syntax RelExp = lte: SimpleExp WS_ "\<=" WS_ SimpleExp ;
syntax RelExp = neq: SimpleExp WS_ "!=" WS_ SimpleExp ;
syntax RelExp = simple: SimpleExp ;
syntax SimpleExp = add: SimpleExp WS_ "+" WS_ Term ;
syntax SimpleExp = sub: SimpleExp WS_ "-" WS_ Term ;
syntax SimpleExp = term: Term ;
syntax SimpleExp = uminus: "-" WS_ Term ;
syntax SimpleExp = uplus: "+" WS_ Term ;
syntax Term = factor: Factor ;
syntax Term = mul: Term WS_ "*" WS_ Factor ;
syntax Body = exp: "." Exp!letexp e;
syntax Body = lam: Name n Body b;
syntax Exp = app: Exp!letexp e Short s;
syntax Exp = short: Short s;
syntax Name = longid: "\<" LongId lId "\>" ;
syntax Name = shortid: ShortId sId;
syntax Short = lam: "\\" Name n Body b;
syntax Short = paren: "(" Exp!letexp e ")" ;
syntax Short = var: Name n;
One thing you may notice is that I use WS_ and WS__ and a separate whitespace layout. This is because I built a generator that translates Banana Algebra into Rascal and I have not removed them yet. However, I experienced the same problem with a totally different grammar with no such tokens.
Tip: remove all whitespace stuff, and extend lang::std::Layout
and extend lang::std::Comment
.
You should add follow restriction on lexicals:
lexical Id = [a-zA-Z][0-9a-zA-Z_]* !>> [a-z-A-Z0-9];
lexical ShortId = [a-zA-Z];
lexical LongId = Id;
Probably it's slow because all your layout is ambiguous.
Oh, and WS_
is not needed.
Thank you for your help. I have to admit that I did not spend too much time on the layout yet.
I removed all WS_, WS__ and layouts, restarted Rascal and set the header of the file (right before the actual syntax definitions) to:
extend lang::std::Layout;
extend lang::std::Comment;
lexical Id = [a-zA-Z][0-9a-zA-Z_]* !>> [a-zA-Z0-9];
lexical ShortId = [a-zA-Z];
lexical LongId = Id;
The only ambiguity I can see still is that (Exp)a
can either be a Short.var or a Prim.var. Unfortunately, it is still not loading.
syntax LetExp = letfun: "let" WS__ Id WS_ "(" WS_ Id WS_ ")" WS_ "=" WS_ LetExp WS_ "in" WS_ LetExp ;
syntax LetExp = letrec: "letrec" WS__ Id WS_ "(" WS_ Id WS_ ")" WS_ "=" WS_ LetExp WS_ "in" WS_ LetExp ;
syntax LetExp = letvar: "let" WS__ Id WS_ "=" WS_ LetExp WS_ "in" WS_ LetExp ;
syntax
means that layout is added between all elements in your production. So this might nog be what you want. There is an undocumented flag to provide custom layout for your production, I think you want that.
The tag is @manual
so it would be:
syntax LetExp
= @manual letfun: "let" WS__ Id WS_ "(" WS_ Id WS_ ")" WS_ "=" WS_ LetExp WS_ "in" WS_ LetExp
| @manual letrec: "letrec" WS__ Id WS_ "(" WS_ Id WS_ ")" WS_ "=" WS_ LetExp WS_ "in" WS_ LetExp
| @manual LetExp = letvar: "let" WS__ Id WS_ "=" WS_ LetExp WS_ "in" WS_ LetExp
;
and:
lexical WS_ = (Wsp | Comment)* !>> [\ \n\r];
lexical WS__ = (Wsp | Comment)+ !>> [\ \n\r];
can this issue be closed?
Yes, the performance is grammar dependent of course. We can't fix anything here.
Parsing big files using Rascal's
syntax
-definitions does not work properly as it will be constantly loading (I tried to parse a file of around 4MB). Furthermore, I tried to load a (generated) 240KB Rascal file through animport
-declaration, which also hung. After loading Rascal in debug mode, I found out that Rascal was still parsing after 45 minutes. I think this may somehow be caused by Rascal's backtracking facilities, as "Execute Selected Text in Console" works almost instantly, in which one statement is executed at a time (probably the console uses some kind of pre-parsing technique).