jdagz28 / Minishell

0 stars 0 forks source link

Minishell

In this project, we aim to create a custom shell, akin to the Bourne Again Shell (bash). The key capabilities required for this Minishell include:


Lessons Learned

Reflection for future projects


Required capabilities

General loop of a shell

The shell's fundamental logic consists of three steps:

  1. Read:
    • Utilize the readline() function to read commands from the standard input.
  2. Parse:
    • Separate (tokenization) the input command string into a program and its arguments.
  3. Execute:
    • Run the parsed command, executing the specified program with the provided arguments.

Tokenization / Lexical Analysis

Tokenization is the first step in parsing a command string, involves dividing the command string into tokens and categorizing their appropriate type. The key aspects of the tokenization process are:

We started the project of doing the bonus so we initially processed tokens required for bonus: &&, ||, ( ), and ;. We also have initially included those tokens in our grammar and syntax checks. However, this project became more challenging for us as we progress deeper to the project and we ultimately decided to drop the bonus and focus on fulfilling the mandatory part of the subject.

Syntax

After the tokenization process, we check for acceptable token combinations or syntax based on our grammar. We based our grammar from bash but limited on fulfilling the mandatory part of the subject.

Token Types

The tokens are categorized into distinct types, each represented by the t_tk_kind enumeration:

typedef enum e_tk_kind
{
    TK_WORD,         // Represents a general word in the command.
    TK_PIPE,         // Indicates the presence of a pipe (|) in the command.
    TK_OPERATOR,     // Denotes "\n"
    TK_REDIRECT,     // Indicates a redirection operator (<, >, <<, >>).
    TK_EOF           // Signifies the end of the command string with a NULL word.
} t_tk_kind;

Our Grammar

An overview of the expected sequence of tokens in a valid syntax. The grammar is important as it would be the basis on how you parse the command string. This grammar is in Backus-Naur Form.

Basics:

pipeline                ::= command
                         |   pipeline "|" command

command                 ::= word
                         |   redirection
                         |   command word
                         |   quoted_command

redirection             ::= redirectionop filename

redirectionop           ::= "<"  |  ">"  |  "<<" | ">>"

quoted_command          ::= single_quoted_sequence
                         |   double_quoted_sequence

single_quoted_sequence  ::= "'" word "'"
double_quoted_sequence  ::= '"' word '"'

parenthesized_command   ::= "(" list ")

Abstract Syntax Tree (AST)

The Abstract Syntax Tree (AST) serves as hierarchical and structured representation of the parsed code. This enables adding logic and comprehension of the tokens extracted from the command string input. This facilitates the seamless execution of commands.

In simple words, AST is making sense out of the tokens that we have

Since we are only focusing on the mandatory part , we did not deal on command list, logical operators and semicolon. The root node of our tree is always a pipe. If there is no pipe in the command string then there is no tree but only a single node.

It is important to refer back to the grammar as it is the basis of what consists a valid non-terminal symbol. In short, a valid syntax.

The general process in building the AST.

Example 1 For the command << EOF cat | grep "hello". It will be tokenized and parsed as below.

LexParser_Test> << EOF cat | grep "hello"
Token 0
Token:  <<
Type:   TK_REDIRECT

Token 1
Token:  EOF
Type:   TK_WORD

Token 2
Token:  cat
Type:   TK_WORD

Token 3
Token:  |
Type:   TK_PIPE

Token 4
Token:  grep
Type:   TK_WORD

Token 5
Token:  "hello"
Type:   TK_WORD

Token 6
Type:   TK_EOF

LexParser_Test> 

ast_1

Example 2 For command cat file.txt | grep "keyword" | sort -r | awk '{print $2}' > output.txt . It has multiple pipes and a redirection

LexParser_Test> cat infile.txt | grep "hello" | sort -r | awk '{print $2}' > output.txt
Token 0
Token:  cat
Type:   TK_WORD

Token 1
Token:  infile.txt
Type:   TK_WORD

Token 2
Token:  |
Type:   TK_PIPE

Token 3
Token:  grep
Type:   TK_WORD

Token 4
Token:  "hello"
Type:   TK_WORD

Token 5
Token:  |
Type:   TK_PIPE

Token 6
Token:  sort
Type:   TK_WORD

Token 7
Token:  -r
Type:   TK_WORD

Token 8
Token:  |
Type:   TK_PIPE

Token 9
Token:  awk
Type:   TK_WORD

Token 10
Token:  '{print $2}'
Type:   TK_WORD

Token 11
Token:  >
Type:   TK_REDIRECT

Token 12
Token:  output.txt
Type:   TK_WORD

Token 13
Type:   TK_EOF

LexParser_Test> 

ast_2

Here are our functions to produce the .dot files for creating the graph and with graphviz you can convert it an image on the command line

void    print_ast_dot(t_node *node, FILE *output);

void    create_dotfile(t_node *ast, int cmd_index)
{
    FILE    *dotFile;
    char    filename[50];

    snprintf(filename, sizeof(filename), "./dotfiles/ast_%d.dot", cmd_index);
    dotFile = fopen(filename, "w");
    if (dotFile == NULL)
    {
        printf("Error: cannot create dot file\n");
        return ;
    }
    fprintf(dotFile, "digraph AST {\n");
    print_ast_dot(ast, dotFile);
    fprintf(dotFile, "}\n");
    fclose(dotFile);
}

void print_ast_dot(t_node *node, FILE *output)
{
    if (node == NULL)
        return ;
    fprintf(output, "  node%p [label=\"", (void *)node);
    if (node->type == SIMPLE_CMD)
    {
        fprintf(output, "Simple Command: ");
        for (char **arg = node->content.simple_cmd.argv; *arg != NULL; ++arg)
            fprintf(output, "%s ", *arg);
    }
    else
        fprintf(output, "Node Type: %s", node_type_strings[node->type]);
    fprintf(output, "\"];\n");
    if (node->type != SIMPLE_CMD)
    {
        if (node->content.child.left)
        {
            print_ast_dot(node->content.child.left, output);
            fprintf(output, "  node%p -> node%p [label=\"Left\"];\n", (void *)node, (void *)(node->content.child.left));
        }
        if (node->content.child.right)
        {
            print_ast_dot(node->content.child.right, output);
            fprintf(output, "  node%p -> node%p [label=\"Right\"];\n", (void *)node, (void *)(node->content.child.right));
        }
    }
}

Then install graphviz if you need to.

sudo apt-get install graphviz 
dot -Tpng {filename.dot} -o {output.png} //to convert to .png

Expansion

After building the Abstract Syntax Tree, the next phase is to check the arguments within simple command to identify any arguments needed for expansion. Expansion handles the following:

Execution

Redirections

Simple Command Node

Pipe Node