Build token table - Githubissues

skeet70 commented 11 years ago

Judging from descriptions in class and on the milestone, the reserved word table is part of the Scanner, or is checked immediately after the Scanner and before the Parser. This needs to be implemented so that the format of our printout is accurate.

iduhetonas commented 11 years ago

I've created a prototype for how our token table should work. It runs on these two principles:

Reserved words, error codes, and everything else is a concrete data type with the correct Pascal code written in. Hereafter, these will be referred to as codes because every one of these is just a code.
Token is a non-concrete data type that accepts a reserved word, error code, etc.

This allows us to pass all of our codes through the Token non-concrete type, but gives us the flexibility of pulling a code out of Token and determining what type of code it is simultaneously.

I've written an example on how to use our new token table:

https://gist.github.com/4661593

I'll continue developing on this table, as I'd like to have helper functions that will pull out the string or the type to make code easier to read. But you can start peppering your code with Tokens, now.

Edit: If you want to start using it right away, I've pushed the branch containing the prototype to the repository. You can find it under "TokenTable_23"

skeet70 commented 11 years ago

I've pushed to develop what I think is going on, but I don't really know. If I'm doing it wrong, let me know what needs to change, or how this is supposed to work. I can't see TokenTable_23 anywhere on here, so I'm just guessing at what the function types are supposed to be.

Once a working Token type or table or something has been pushed/implemented, I'll keep working, right now I can't compile, and don't want to make a bunch of changes and add stuff just to go back and refactor it all.

iduhetonas commented 11 years ago

I believe I've finished with the TokenTable, including the helper functions to obtain different parts of a token (such as the type of token, or token itself). Please see this Gist for examples on how to obtain different parts of a token.

https://gist.github.com/4661593

Also, I've modified the digitFSA function to reflect handling of the new token data. Please see that for clarification on how to modify your own FSA's as well.

Here are the advantages, or why I've written the TokenTable in the way that I have.

Improved type checking - If we write a function that we intend to return an error code (MP_ERROR) and accidentally confuse it with a different token (MP_AND), the compiler will not allow this, because we need to precede each token with its subclass. In this example, we would say:

let token = ErrorCodes MP_ERROR let otherToken = ReservedWords MP_AND

Therefore, we have fewer errors to deal with.

Meta-data allows for fewer "Guard Lumps". - Rather than creating a top-level function that matches every token individually, we can break up each subclass of tokens into their own components. This will keep us from devolving into an anti-pattern where there's one massive "blob" function doing all of the token-checking.

For those of us that would prefer to not use the meta-data functionality, we can easily use the helper functions defined in TokenTable.hs to just give us the token name. As such, if you want the token itself, you only need to call "unwrapToken newToken" as opposed to just "newToken".

Which brings me to my next point.

Helper Functions

unwrapToken: Give this function a token, it will return the name:

let newToken = ErrorCodes MP_ERROR
unwrapToken newToken
"MP_ERROR"

getTokenType: Give this function a token, it will return the subclass:

let newToken = ErrorCodes MP_ERROR
getTokenType newToken
"ErrorCodes"

I'm pretty flexible on the names of these data types, as "ErrorCodes" might seem ambiguous being associated with "Codes". However, other than that, I'm convinced that this is the way to go.

iduhetonas commented 11 years ago

TokenTable module is tested and complete. Let me know if you have any questions, concerns, or if you find any bugs.

I'll also add this TokenTable file to the Wiki.

skeet70 / pascal-compiler

Build token table #23