feat: lexer - Githubissues

tjgurwara99 commented 2 years ago

Let me know what you think of this structure. If the command needs to access state we can change the parser to include the state. Of course this is a preliminary right now because over time it would get difficult to mange this but I think this is the right direction for this. Note the lexer is incomplete - its not being used right now so it shouldn't won't break anything but I included it here because if you have any ideas for it then it would be good to know before I properly implement it.

tjgurwara99 commented 2 years ago

Also, there are some issues with lex*Quotes methods. I'm not sure if they are supposed to be considered IDENTs or something else when it comes to shells. I personally think the double quotes are definitely IDENTS and backpacks are commands to be run and their output to be put in place of those but I don't have any clue for single quotes. Let me know if you have any idea.

tjgurwara99 commented 2 years ago

The current syntax the lexer takes is not what I was envisioning, but it is fine for now. I actually do not want another compiler for bash :D

Ah I see. Can you list all the tokens that need to be supported? That way I can just create those. Although I would also need to know when to stop the tokenising the value - for example, consider double quotes, I'm not sure what to do there haha, cause in traditional shell, the double quote is considered a WORD (in our case IDENT) but single quotes are different based on different shells and backticks are much more complex because they are also a type of redirection. So the lexing of value needs to be done correctly. Just let me know and I will make the changes 😄

raklaptudirm commented 2 years ago

Working on it: https://gist.github.com/raklaptudirm/9aa25462cbb434906a340d047184a23e

raklaptudirm commented 2 years ago

@tjgurwara99 Tell me what you think about the grammar.

tjgurwara99 commented 2 years ago

Yeah just read through it and I mostly like this now. Its quite clean, honestly looks cool. The only thing that I have a concern about is the FILE_NAME axiom. I don't think having a separate token for filename is useful. Most of the commands (except the builtin ones) are binary files in and of themselves. So I think it would become runtime heavy to check whether a given identifier is a FILE_NAME or an IDENT. Does that make sense?

tjgurwara99 commented 2 years ago

Also, is negation a keyword that you're thinking of? or an operator? I can't quite understand that part clearly.

raklaptudirm commented 2 years ago

Filenames will indeed be IDENTs, I just used a different axiom to show exactly how the operator should be used.

https://gist.github.com/raklaptudirm/9aa25462cbb434906a340d047184a23e#file-tokens-go In this file I have listed the required tokens, so take a look.

raklaptudirm commented 2 years ago

Negation is just a production name for the not operator !.

The explanation of the grammer notation used can be found at:

https://craftinginterpreters.com/representing-code.html#context-free-grammars https://craftinginterpreters.com/representing-code.html#rules-for-grammars https://craftinginterpreters.com/representing-code.html#enhancing-our-notation

The three links are consicutive paragraphs, so no need to click all 3 seperately.

tjgurwara99 commented 2 years ago

This lexer is now capable of converting every bash command into tokens. Before I can add more, I saw that you have added more tokens to the supported tokens list, so could you explain what the lexer needs to lex, so for example, lexer needs to lex the token which has tokentype and value what should the value for the value be for each of the token?

tjgurwara99 commented 2 years ago

Also, since I've never really done anything related to parsing to byte code, I'm afraid you will have to write the parser yourself. I can help review the work but I can't really give you an input on byte code interpretation 😓 I will hardly be any help there. So you can continue with this after the lexer is complete 😄 - its almost done in fact (all the tokens are working and tested).

raklaptudirm commented 2 years ago

Looks like this project can be a learning experience for you! If you are interested in learning more about bytecode compilation and execution, I would highly suggest this.

Bytecode is a really simple yet powerful concept in the area of creating programming languages, so I would suggest knowing your way around it.

tjgurwara99 commented 2 years ago

True it might be a learning experience and I'll give the crafting interpreter a read. But I support the notion of maintainable code so I would still not be writing the parser. I know because of my lack of understanding of byte code, I will not be able to write it in a good way and in the long run it means that we will lose maintainability. So if you write the parser, I will be able to understand how the bytecode works but again, I don't want to jeopardise this projects maintainability. All of the stuff done so far is so that any new syntax you add can be written easily, hence IMO highly maintainable.

raklaptudirm commented 2 years ago

Once you have a grammar for your language, writing a parser is trivial :)

raklaptudirm commented 2 years ago

Since we started collaborating on this project, I sort of stopped interacting in these threads.

I actually hit a road block with my design choices with mash. As you pointed out, a single syntax for both the repl and the script is the way to go, since the shell is just a repl for the script language.

Unifying the syntaxes was problematic for some of the design choices that I made, but on the other hand, I wholeheartedly agreed that a single syntax is indeed better. Finding myself in this crossroads, I decided to rethink how to approach this project.

Some stuff came up on the way then, with my exams and other things. I was still planning it in my head, but was not happy with it still. But I think I have decided upon what to do here.

The lexer will be extremely simple, so a lot of the work that we did here will not be useful. In truth, none of the work, including my commits till date would be useful. I want to strip the project raw and start anew. This is not a new thing for me, as I have done this on other projects in the past. Trying to save old code will only hinder progress of the project. I also am working on the parser grammar, and am not currently thinking of the execution, which I am sure we will be able to figure out.

If you want to continue our collaboration on this project, which I hugely appreciate, I suggest we continue our discussion in #2. I will be posting the details about the lexer and parser there, if they still interest you. I am closing this pull request and thank you for all the time you have devoted to this project. Cheers :D

tjgurwara99 commented 2 years ago

Cool, lets discuss on the issue's page. I'll close this PR for now and we can start from scratch - I prefer to think things from the ground up so it would be cool to do that 😄

rakarchive / mash

feat: lexer #3