multitudes / 42-minishell

This project is about creating a simple shell
MIT License
0 stars 0 forks source link

REFACTOR / FIX [token types] #189

Closed ProjektPhoenix closed 2 months ago

ProjektPhoenix commented 2 months ago
  1. It would be nice to have all redirection tokens in sequence in the token list, then a check for the general presence of a redirection token can be more easily done.

  2. Some tokens seem to be duplicated or incorrect, let us check the following: DLESS (26) vs DLESS_DELIM (27) DGREAT (25) vs REDIRECT_OUT_APP (32) REDIRECT_OUT_APP, // '&>>' vs. REDIRECT_BOTH_APP, //">>&"

  3. It would be nice to also account for escaped tokens. There are also examples in the evaluation cases. These could be identified as specific tokens or handled in expansion.

> (echo > just outputs '>')

multitudes commented 2 months ago

Yes it is good to do this at the end. Maybe we could add a token type EXPANDED so we do not try to expand again or expand things which are not gonna be expanded? When I do glowing ls * I see :

DEBUG src/analyser/analyser.c:expand_tokenlist:394: Token expanded - token type: 0
DEBUG src/scanner/token_functions.c:new_toknode:58: token created type 0 -.DS_Store-
DEBUG src/scanner/token_functions.c:new_toknode:58: token created type 0 -LICENSE-
DEBUG src/scanner/token_functions.c:new_toknode:58: token created type 0 -Makefile-
DEBUG src/scanner/token_functions.c:new_toknode:58: token created type 0 -ls.out-
DEBUG src/scanner/token_functions.c:new_toknode:58: token created type 0 -a-
DEBUG src/scanner/token_functions.c:new_toknode:58: token created type 0 -README.md-
DEBUG src/scanner/token_functions.c:new_toknode:58: token created type 0 -minishell-
DEBUG src/scanner/token_functions.c:new_toknode:58: token created type 0 -.gitignore-
DEBUG src/scanner/token_functions.c:new_toknode:58: token created type 0 -non_interactive_commands.minishell-
DEBUG src/analyser/analyser.c:expand_tokenlist:394: Token expanded - token type: 0
DEBUG src/analyser/analyser.c:expand_tokenlist:394: Token expanded - token type: 0
DEBUG src/analyser/analyser.c:expand_tokenlist:394: Token expanded - token type: 0
DEBUG src/analyser/analyser.c:expand_tokenlist:394: Token expanded - token type: 0
DEBUG src/analyser/analyser.c:expand_tokenlist:394: Token expanded - token type: 0
DEBUG src/analyser/analyser.c:expand_tokenlist:394: Token expanded - token type: 0
DEBUG src/analyser/analyser.c:expand_tokenlist:394: Token expanded - token type: 0
DEBUG src/analyser/analyser.c:expand_tokenlist:394: Token expanded - token type: 0
DEBUG src/analyser/analyser.c:expand_tokenlist:394: Token expanded - token type: 0

Not sure what all this means but we don't need to expand a token created from a list of file names, so the expansion there (if any) could be skipped

multitudes commented 2 months ago

I would like before the evaluation to make a branch with a reduced list of tokens... Only the tokens we need. But we should do this at the very end.

I quote below the manual... so looking at <<- it allows tabs in the heredoc (we do not implement this)

DLESS (26) vs DLESS_DELIM (27)

Apparently they are needed? the DLESS name is used in the shell manual

image

DLESS_DELIM is of course not in the manual but I cannot call it suddenly heredoc delimiter. is the delimiter as in this scanning function (untested!)


bool    add_here_and_delim(t_mini_data *data, int *i)
{
    int     start;
    char    *tmp;

    add_token(data, i, "<<", DLESS);
    while (data->input[*i] && is_space(data->input[*i]))
        advance(i);
    if (data->input[*i] == '\0' || is_delimiter(data->input[*i]))
        return (scanner_error(data, "error: missing here-delim"));
    start = *i;
    while ((data->input + *i) && !is_delimiter(data->input[*i]))
        advance(i);
    tmp = ft_substr(data->input, start, *i - start);
    *i = start;
    add_token(data, i, tmp, DLESS_DELIM);
    free(tmp);
    return (true);
}

At then beginning I did not know what we will need for the heretic so I check for the DLESS token and the DLESS_DELIM is the delimiter which comes after it. And I don't check for social characters however we could change that.

In the manual: 3.6.6 Here Documents

This type of redirection instructs the shell to read input from the current source until a line containing only word (with no trailing blanks) is seen. All of the lines read up to that point are then used as the standard input (or file descriptor n if n is specified) for a command.

The format of here-documents is:

[n]<<[-]word
        here-document
delimiter

No parameter and variable expansion, command substitution, arithmetic expansion, or filename expansion is performed on word. If any part of word is quoted, the delimiter is the result of quote removal on word, and the lines in the here-document are not expanded. If word is unquoted, all lines of the here-document are subjected to parameter expansion, command substitution, and arithmetic expansion, the character sequence \newline is ignored, and ‘\’ must be used to quote the characters ‘\’, ‘$’, and ‘`’.

If the redirection operator is ‘<<-’, then all leading tab characters are stripped from input lines and the line containing delimiter. This allows here-documents within shell scripts to be indented in a natural fashion.

multitudes commented 2 months ago

There is also already a tokenizer test for the heretic tokens:

const char* test_scanner_append() {

    // i want to check the output of the call to the function in scanner.c file
    // tokenizer(char *input) returning a t_list of lexemes
    // I will create a string and check the output of the function
    std::string str = "cat << EOF | grep 'foo' \n This is a line. \n This is another line. \n This line contains foo. \n EOF";
    const char* input = str.c_str();
    t_list *lexemes = tokenizer(input);
    t_list *current = lexemes;
    const char *result;
    int i = 0;

    result = process_token(&current, &i, "cat", WORD);
    result = process_token(&current, &i, "<<", DLESS);
    result = process_token(&current, &i, "EOF", DLESS_DELIM);
    result = process_token(&current, &i, "|", PIPE);
    result = process_token(&current, &i, "grep", WORD);
    result = process_token(&current, &i, "'foo'", S_QUOTED_STRING);
    result = process_token(&current, &i, "This", WORD);
    result = process_token(&current, &i, "is", WORD);
    result = process_token(&current, &i, "a", WORD);
    result = process_token(&current, &i, "line.", WORD);
    result = process_token(&current, &i, "This", WORD);
    result = process_token(&current, &i, "is", WORD);
    result = process_token(&current, &i, "another", WORD);
    result = process_token(&current, &i, "line.", WORD);
    result = process_token(&current, &i, "This", WORD);
    result = process_token(&current, &i, "line", WORD);
    result = process_token(&current, &i, "contains", WORD);
    result = process_token(&current, &i, "foo.", WORD);
    result = process_token(&current, &i, "EOF", WORD);

    // this is how I check for the end of the list
    result = process_token(&current, &i, NULL, NULL_TOKEN);

    return result;
}
multitudes commented 2 months ago

DGREAT (25) vs REDIRECT_OUT_APP (32)

command &>> outputfile.txt the &>> redirection operator is used to append both standard output (stdout) and standard error (stderr) from a command to a file. This operator combines the functionalities of >> (which appends stdout to a file) and 2>&1 (which redirects stderr to stdout), making it a concise way to append all output from a command to a single file.

Originally since we have &> REDIRECT_BOTH then it makes sense to have &>> to be REDIRECT_BOTH_APP In Bash >>& does not exist as a built-in operator for redirection. Bash uses &>> for appending both standard output (stdout) and standard error (stderr) to a file. However, It is used in other shells like csh and tcsh, and therefore I included it at the very beginning because I did not researched all of them enough yet!

So yes I will rename the &>> to be REDIRECT_BOTH_APP and delete the non bash one...

multitudes commented 2 months ago

> (echo > just outputs '>')

?? Do you mean echo > ? In bash

bash-3.2$ echo >
bash: syntax error near unexpected token `newline'

So I assume you mean

echo '>'

see below for the EXPANDED / ESCAPED tokens

multitudes commented 2 months ago

escaped tokens

It would be nice to also account for escaped tokens. There are also examples in the evaluation cases. These could be identified as specific tokens or handled in expansion.

Ok as above Some tokens are EXPANDED and don't need to be expanded again and others are ESCAPED. Yes the ESCAPED seems to be very important. Should we create a token name for both?

I would suggest RESOLVED_TOKEN So if you set the token to RESOLVED_TOKEN and later during expansion and quotes stripping etc you set it to this new token type then you can check for this token before expanding again

multitudes commented 2 months ago

Naming

The shell manual doesnt name the < explicitly so I named it as REDIRECT_IN It is a bit annoying because << is DLESS Since we quote the shell manual a lot we should keep the DLESS and then rename the < and > to LESS and GREAT? @ProjektPhoenix

multitudes commented 2 months ago

I removed the STAR token because used only as GLOBBING removed GREATER_AND_GREATER, // ">&>", (don't know where I saw that ;) removed DIGIT removed CHAR, HASH