In this project, we aim to create a custom shell, akin to the Bourne Again Shell (bash). The key capabilities required for this Minishell include:
ubuntu-latest
.norminette
errors (we will have this one later on as we have coded not accordingly to norm)‘ ’
) and double quotes (“ ”
).<
, >
, <<
, >>
).|
) for command chaining.$?
.echo
with the n
option.cd
accepting only relative or absolute paths.pwd
(no options) - to display the current working directory.export
(no options) - to set environment variables.unset
(no options) - to remove environment variables.env
(no options) - to display the current environment.exit
(no options) - to terminate the shell.The shell's fundamental logic consists of three steps:
readline()
function to read commands from the standard input.Tokenization is the first step in parsing a command string, involves dividing the command string into tokens and categorizing their appropriate type. The key aspects of the tokenization process are:
TK_EOF
to easily iterate through the token list.We started the project of doing the bonus so we initially processed tokens required for bonus:
&&
,||
,( )
, and;
. We also have initially included those tokens in our grammar and syntax checks. However, this project became more challenging for us as we progress deeper to the project and we ultimately decided to drop the bonus and focus on fulfilling the mandatory part of the subject.
After the tokenization process, we check for acceptable token combinations or syntax based on our grammar. We based our grammar from bash but limited on fulfilling the mandatory part of the subject.
The tokens are categorized into distinct types, each represented by the t_tk_kind
enumeration:
typedef enum e_tk_kind
{
TK_WORD, // Represents a general word in the command.
TK_PIPE, // Indicates the presence of a pipe (|) in the command.
TK_OPERATOR, // Denotes "\n"
TK_REDIRECT, // Indicates a redirection operator (<, >, <<, >>).
TK_EOF // Signifies the end of the command string with a NULL word.
} t_tk_kind;
An overview of the expected sequence of tokens in a valid syntax. The grammar is important as it would be the basis on how you parse the command string. This grammar is in Backus-Naur Form.
Basics:
pipeline ::= command
| pipeline "|" command
command ::= word
| redirection
| command word
| quoted_command
redirection ::= redirectionop filename
redirectionop ::= "<" | ">" | "<<" | ">>"
quoted_command ::= single_quoted_sequence
| double_quoted_sequence
single_quoted_sequence ::= "'" word "'"
double_quoted_sequence ::= '"' word '"'
parenthesized_command ::= "(" list ")
The Abstract Syntax Tree (AST) serves as hierarchical and structured representation of the parsed code. This enables adding logic and comprehension of the tokens extracted from the command string input. This facilitates the seamless execution of commands.
In simple words, AST is making sense out of the tokens that we have
Since we are only focusing on the mandatory part , we did not deal on command list, logical operators and semicolon. The root node of our tree is always a pipe. If there is no pipe in the command string then there is no tree but only a single node.
It is important to refer back to the grammar as it is the basis of what consists a valid non-terminal symbol. In short, a valid syntax.
|
) to a simple command, and vice versa” “
) or (’ ‘
)The general process in building the AST.
SIMPLE_CMD
node.TK_PIPE
is encountered, a PIPE
node is created.SIMPLE_CMD
node ais the designated as the left node of the newly created PIPE
node.SIMPLE_CMD
node as right node of the PIPE
nodePIPE
node is encountered, the entire preceding tree is transposed to the left node of the the new PIPE
nodeExample 1
For the command << EOF cat | grep "hello"
. It will be tokenized and parsed as below.
LexParser_Test> << EOF cat | grep "hello"
Token 0
Token: <<
Type: TK_REDIRECT
Token 1
Token: EOF
Type: TK_WORD
Token 2
Token: cat
Type: TK_WORD
Token 3
Token: |
Type: TK_PIPE
Token 4
Token: grep
Type: TK_WORD
Token 5
Token: "hello"
Type: TK_WORD
Token 6
Type: TK_EOF
LexParser_Test>
Example 2
For command cat file.txt | grep "keyword" | sort -r | awk '{print $2}' > output.txt
. It has multiple pipes and a redirection
LexParser_Test> cat infile.txt | grep "hello" | sort -r | awk '{print $2}' > output.txt
Token 0
Token: cat
Type: TK_WORD
Token 1
Token: infile.txt
Type: TK_WORD
Token 2
Token: |
Type: TK_PIPE
Token 3
Token: grep
Type: TK_WORD
Token 4
Token: "hello"
Type: TK_WORD
Token 5
Token: |
Type: TK_PIPE
Token 6
Token: sort
Type: TK_WORD
Token 7
Token: -r
Type: TK_WORD
Token 8
Token: |
Type: TK_PIPE
Token 9
Token: awk
Type: TK_WORD
Token 10
Token: '{print $2}'
Type: TK_WORD
Token 11
Token: >
Type: TK_REDIRECT
Token 12
Token: output.txt
Type: TK_WORD
Token 13
Type: TK_EOF
LexParser_Test>
Here are our functions to produce the .dot
files for creating the graph and with graphviz
you can convert it an image on the command line
void print_ast_dot(t_node *node, FILE *output);
void create_dotfile(t_node *ast, int cmd_index)
{
FILE *dotFile;
char filename[50];
snprintf(filename, sizeof(filename), "./dotfiles/ast_%d.dot", cmd_index);
dotFile = fopen(filename, "w");
if (dotFile == NULL)
{
printf("Error: cannot create dot file\n");
return ;
}
fprintf(dotFile, "digraph AST {\n");
print_ast_dot(ast, dotFile);
fprintf(dotFile, "}\n");
fclose(dotFile);
}
void print_ast_dot(t_node *node, FILE *output)
{
if (node == NULL)
return ;
fprintf(output, " node%p [label=\"", (void *)node);
if (node->type == SIMPLE_CMD)
{
fprintf(output, "Simple Command: ");
for (char **arg = node->content.simple_cmd.argv; *arg != NULL; ++arg)
fprintf(output, "%s ", *arg);
}
else
fprintf(output, "Node Type: %s", node_type_strings[node->type]);
fprintf(output, "\"];\n");
if (node->type != SIMPLE_CMD)
{
if (node->content.child.left)
{
print_ast_dot(node->content.child.left, output);
fprintf(output, " node%p -> node%p [label=\"Left\"];\n", (void *)node, (void *)(node->content.child.left));
}
if (node->content.child.right)
{
print_ast_dot(node->content.child.right, output);
fprintf(output, " node%p -> node%p [label=\"Right\"];\n", (void *)node, (void *)(node->content.child.right));
}
}
}
Then install graphviz
if you need to.
sudo apt-get install graphviz
dot -Tpng {filename.dot} -o {output.png} //to convert to .png
After building the Abstract Syntax Tree, the next phase is to check the arguments within simple command to identify any arguments needed for expansion. Expansion handles the following:
$VARIABLE
, with their valuesecho $?
cat << "EOF"
. The delimeter will only be EOFheredoc
, as it solely substitutes $VARIABLE
with the value and retains any exiting quotes