Closed flofreud closed 11 years ago
I have a few questions, on the interface design, proposed by flofreud & Co.
@flofreud correct me if i'm wrong:
a few updates, thanks to @flofreud
I think the regex, if present, should be a comment, not part of the enumeration. I'm not sure if the lexer can be regex based for the language if we would define the string equal to Java. The token-definition is part of the terminal definition for the grammer where some input is missing (see https://github.com/swp-uebersetzerbau-ss13/common/wiki/Grammar)
On Tue, Apr 23, 2013 at 5:24 PM, Tkrauss notifications@github.com wrote:
@flofreud https://github.com/flofreud correct me if i'm wrong:
- The Token class has a generic parameter T.
- They are special in the case of defining T. extending Token resp. ( Realtoken with Double, NumToken with Int etc.)
- Don't get your point. there is no regexp in the lexer interface, there is just a regexp representation for each token... because a token is defined by a regexp.
— Reply to this email directly or view it on GitHubhttps://github.com/swp-uebersetzerbau-ss13/common/issues/3#issuecomment-16865185 .
Thank you for the explanation. I understand the idea of the template parameter. This makes "Token" a class template, which has to be specialised for a concrete Token.
Have you considered taking into account the solution we proposed? I think the idea of using a class template could be merged with it.
I think the idea is, that for every "boring" token like "+", "if" etc. the StringToken is used. We are able to distinguish them reading the TokenType. "getExactValue()" would return "+"... that's not of interest, but it's compatible with the idea that every Token represents a subsequence of the source.
Is the proposal in the repo? Since we both seem to use dia, I could make a version merging both designs
This is my proposal: common/doc/lexer/* It uses Java enum Magic, to merge the idea of having a Token hierarchy (needed to make the Lexer able to recognice numerical constants and ids) with the possibility to have a switch statement over Tokens. To decide which downcast is possible, "isNumToken", "is...Token()", ... can be used. Class derivation is used only, where it is needed.
I cleaned the proposal a bit:
I remove the IdToken und StringToken interfaces because the idea was to provide a way to get certain tokens parsed into the correct type on lexer level. If the parse want to it can get always for all tokens the string representation via the getValue()-method.
What do the regular expressions for the terminals have to do with the lexer INTERFACE?
I removed them, because the definition should be stated in grammar and is not part of the interface (implementation details).
Do we really need a method like "getAsNumToken()"? I've never seen such a design until now, where a regular cast is done by a convert-method of the super class...
No, we dont need this, but i didn't wanted to discuss this over and over with deadline 2 days before.
On Wed, Apr 24, 2013 at 9:59 AM, Tkrauss notifications@github.com wrote:
Do we really need a method like "getAsNumToken()"? I've never seen such a design until now, where a regular cast is done by a convert-method of the super class...
— Reply to this email directly or view it on GitHubhttps://github.com/swp-uebersetzerbau-ss13/common/issues/3#issuecomment-16912808 .
Hence i thought we'll remain with the last update of @akrillo89 , which seems to be the best imho. Anyways, i agree on the changed one, if it helps to fix the design :)
@Tkrauss: Me too (without StringToken (unnecessary)). It contains all important facts.
+1 for last update of @akrillo89
I removed StringToken because of I think you are right. In addition I removed the regexp too ( it's a part of the grammar ).
There are several design problems that are in akrillo89's version, that I have already pointed out, and they are there in the "cleaned up" version of the merged design I proposed. For which the "Token.getTypedValue()" just makes no sense. e.g. the token <+>. What is "Token.getTypedValue()" supposed to return for this token? the same as Token.getName()? (which would return "+" as far as I understood). This would result in this situation: < + >.getValue() == "+" < + >.getTypedValue() == "+" but also < num, 7 >.getValue() == "num" < num, 7 >.getTypedValue() == 7 Also the method "Token.getValue()" is not necessary, because it just doubles the method "TokenType.getName()". With the design I proposed it is like this: < + >.getType().getName() == "+" and for tokens with an associated value (like num): < num, 7 >.getType().getName() == "num" < num, 7 >.getValue() == 7
GetValue() returns the readed lexem: < num, 7 >.getValue() == "7" < num, 7 >.getTypedValue() == 7
For all types with non specialed interface both methods return the string representation: < string, 'foo'>.getValue() == "foo" < string, 'foo'>
For tokens like ARITHMOP this would be < ARITHMOP, '+'>.getValue() == "+" < ARITHMOP, '+'>.getTypedValue() == "+"
@EsGeh can you explain why you prefere TokenType.getName() compared to Token.getValue()
These would be to different things. TokenType.getName() on STRING-TokenType would be 'string' and getValue() would result in the readed string.
I think there was a misunderstanding about the meaning of the methods because we have no javadoc but these "cool" diagrams to discuss about the design.
how can I declare a list of Tokens?? "Token", if defined as a template class cannot play the role of a base for all tokens, as far as I know list < Token > is not possible. next try: list < Token < ? > > hmm. I am not shure, if this would be legal in Java. But it is too general anyway. The list could contain a Token < File > , which makes no sense at all.
One can use a generic class when IMPLEMENTING the interfaces, but the interfaces themselves, I think it is better to avoid generics. You use interfaces to specify the behavior of classes. If you have generic interfaces, they tend to have to little restrictions, because you can not know from the interface, which type the type parameter has to have.
I think it would be nicer, to have different Enum values for "+","-","*", and "/". While parsing, it is possible that one is in the situation that he wants to find out, wether a Token is a <+>. Would be nice to be able to do: if(token.getType() == PLUS) ... The enum type could give you all information, about which kind of token. There still should be a way to find out which downcast is available. Therefore I proposed the methods of the enumeration "TokenType.isNumToken()", ....
I think the difference is
if(token.getType() == PLUS)
or
if(token.getType() == ARITHMOP && token.getValue().equals("+"))
There is not such a big profit. It would be very confusing if we create for every subtype a TokenType
@ flofreud, the version in the repo looks quiet good to me. I think it still misses a way to decide which downcast is available for a specific Token
I think uml is quiet good, to discuss interface designs.
Well.. i don't get your point, so i have to ask if you know the instanceof-operator? It tells you exact the information you are trying to plug in the enum class... namely the information if a cast is possible...
I think Javas instanceof is well usable for this. The instanceOf NumToken is only usefull for TokenType Num. I see your point to make this information explicit, but cant image where it is needed by the parser group because the interface convention is to ask for the type only if needed. To provide the information in TokenType we would have to define for every TokenKind three boolean in construction without a really need for this.
case ID:
....
token..getValue()
...
break;
case NUM:
....
if (token instanceof NumToken)
NumToken nt = (NumToken) token;
token.getLongValue();
...
break;
You could also assume the lexer implements correctly and cast directly.
Comments and Discussion for Lexer See Wiki page: Lexer