Closed ProjektPhoenix closed 2 months ago
Yes it is good to do this at the end.
Maybe we could add a token type EXPANDED so we do not try to expand again or expand things which are not gonna be expanded?
When I do glowing ls *
I see :
DEBUG src/analyser/analyser.c:expand_tokenlist:394: Token expanded - token type: 0
DEBUG src/scanner/token_functions.c:new_toknode:58: token created type 0 -.DS_Store-
DEBUG src/scanner/token_functions.c:new_toknode:58: token created type 0 -LICENSE-
DEBUG src/scanner/token_functions.c:new_toknode:58: token created type 0 -Makefile-
DEBUG src/scanner/token_functions.c:new_toknode:58: token created type 0 -ls.out-
DEBUG src/scanner/token_functions.c:new_toknode:58: token created type 0 -a-
DEBUG src/scanner/token_functions.c:new_toknode:58: token created type 0 -README.md-
DEBUG src/scanner/token_functions.c:new_toknode:58: token created type 0 -minishell-
DEBUG src/scanner/token_functions.c:new_toknode:58: token created type 0 -.gitignore-
DEBUG src/scanner/token_functions.c:new_toknode:58: token created type 0 -non_interactive_commands.minishell-
DEBUG src/analyser/analyser.c:expand_tokenlist:394: Token expanded - token type: 0
DEBUG src/analyser/analyser.c:expand_tokenlist:394: Token expanded - token type: 0
DEBUG src/analyser/analyser.c:expand_tokenlist:394: Token expanded - token type: 0
DEBUG src/analyser/analyser.c:expand_tokenlist:394: Token expanded - token type: 0
DEBUG src/analyser/analyser.c:expand_tokenlist:394: Token expanded - token type: 0
DEBUG src/analyser/analyser.c:expand_tokenlist:394: Token expanded - token type: 0
DEBUG src/analyser/analyser.c:expand_tokenlist:394: Token expanded - token type: 0
DEBUG src/analyser/analyser.c:expand_tokenlist:394: Token expanded - token type: 0
DEBUG src/analyser/analyser.c:expand_tokenlist:394: Token expanded - token type: 0
Not sure what all this means but we don't need to expand a token created from a list of file names, so the expansion there (if any) could be skipped
I would like before the evaluation to make a branch with a reduced list of tokens... Only the tokens we need. But we should do this at the very end.
I quote below the manual... so looking at <<-
it allows tabs in the heredoc (we do not implement this)
Apparently they are needed? the DLESS name is used in the shell manual
DLESS_DELIM is of course not in the manual but I cannot call it suddenly heredoc delimiter. is the delimiter as in this scanning function (untested!)
bool add_here_and_delim(t_mini_data *data, int *i)
{
int start;
char *tmp;
add_token(data, i, "<<", DLESS);
while (data->input[*i] && is_space(data->input[*i]))
advance(i);
if (data->input[*i] == '\0' || is_delimiter(data->input[*i]))
return (scanner_error(data, "error: missing here-delim"));
start = *i;
while ((data->input + *i) && !is_delimiter(data->input[*i]))
advance(i);
tmp = ft_substr(data->input, start, *i - start);
*i = start;
add_token(data, i, tmp, DLESS_DELIM);
free(tmp);
return (true);
}
At then beginning I did not know what we will need for the heretic so I check for the DLESS token and the DLESS_DELIM is the delimiter which comes after it. And I don't check for social characters however we could change that.
In the manual: 3.6.6 Here Documents
This type of redirection instructs the shell to read input from the current source until a line containing only word (with no trailing blanks) is seen. All of the lines read up to that point are then used as the standard input (or file descriptor n if n is specified) for a command.
The format of here-documents is:
[n]<<[-]word
here-document
delimiter
No parameter and variable expansion, command substitution, arithmetic expansion, or filename expansion is performed on word. If any part of word is quoted, the delimiter is the result of quote removal on word, and the lines in the here-document are not expanded. If word is unquoted, all lines of the here-document are subjected to parameter expansion, command substitution, and arithmetic expansion, the character sequence \newline is ignored, and ‘\’ must be used to quote the characters ‘\’, ‘$’, and ‘`’.
If the redirection operator is ‘<<-’, then all leading tab characters are stripped from input lines and the line containing delimiter. This allows here-documents within shell scripts to be indented in a natural fashion.
There is also already a tokenizer test for the heretic tokens:
const char* test_scanner_append() {
// i want to check the output of the call to the function in scanner.c file
// tokenizer(char *input) returning a t_list of lexemes
// I will create a string and check the output of the function
std::string str = "cat << EOF | grep 'foo' \n This is a line. \n This is another line. \n This line contains foo. \n EOF";
const char* input = str.c_str();
t_list *lexemes = tokenizer(input);
t_list *current = lexemes;
const char *result;
int i = 0;
result = process_token(¤t, &i, "cat", WORD);
result = process_token(¤t, &i, "<<", DLESS);
result = process_token(¤t, &i, "EOF", DLESS_DELIM);
result = process_token(¤t, &i, "|", PIPE);
result = process_token(¤t, &i, "grep", WORD);
result = process_token(¤t, &i, "'foo'", S_QUOTED_STRING);
result = process_token(¤t, &i, "This", WORD);
result = process_token(¤t, &i, "is", WORD);
result = process_token(¤t, &i, "a", WORD);
result = process_token(¤t, &i, "line.", WORD);
result = process_token(¤t, &i, "This", WORD);
result = process_token(¤t, &i, "is", WORD);
result = process_token(¤t, &i, "another", WORD);
result = process_token(¤t, &i, "line.", WORD);
result = process_token(¤t, &i, "This", WORD);
result = process_token(¤t, &i, "line", WORD);
result = process_token(¤t, &i, "contains", WORD);
result = process_token(¤t, &i, "foo.", WORD);
result = process_token(¤t, &i, "EOF", WORD);
// this is how I check for the end of the list
result = process_token(¤t, &i, NULL, NULL_TOKEN);
return result;
}
command &>> outputfile.txt
the &>>
redirection operator is used to append both standard output (stdout
) and standard error (stderr
) from a command to a file. This operator combines the functionalities of >>
(which appends stdout
to a file) and 2>&1
(which redirects stderr
to stdout
), making it a concise way to append all output from a command to a single file.
Originally since we have
&>
REDIRECT_BOTH then it makes sense to have &>>
to be REDIRECT_BOTH_APP
In Bash >>&
does not exist as a built-in operator for redirection. Bash uses &>>
for appending both standard output (stdout) and standard error (stderr) to a file. However, It is used in other shells like csh and tcsh, and therefore I included it at the very beginning because I did not researched all of them enough yet!
So yes I will rename the &>>
to be REDIRECT_BOTH_APP and delete the non bash one...
??
Do you mean echo >
?
In bash
bash-3.2$ echo >
bash: syntax error near unexpected token `newline'
So I assume you mean
echo '>'
see below for the EXPANDED / ESCAPED tokens
It would be nice to also account for escaped tokens. There are also examples in the evaluation cases. These could be identified as specific tokens or handled in expansion.
Ok as above Some tokens are EXPANDED and don't need to be expanded again and others are ESCAPED. Yes the ESCAPED seems to be very important. Should we create a token name for both?
I would suggest RESOLVED_TOKEN So if you set the token to RESOLVED_TOKEN and later during expansion and quotes stripping etc you set it to this new token type then you can check for this token before expanding again
The shell manual doesnt name the <
explicitly so I named it as REDIRECT_IN
It is a bit annoying because <<
is DLESS
Since we quote the shell manual a lot we should keep the DLESS
and then rename the <
and >
to LESS
and GREAT
?
@ProjektPhoenix
I removed the STAR token because used only as GLOBBING removed GREATER_AND_GREATER, // ">&>", (don't know where I saw that ;) removed DIGIT removed CHAR, HASH
It would be nice to have all redirection tokens in sequence in the token list, then a check for the general presence of a redirection token can be more easily done.
Some tokens seem to be duplicated or incorrect, let us check the following: DLESS (26) vs DLESS_DELIM (27) DGREAT (25) vs REDIRECT_OUT_APP (32) REDIRECT_OUT_APP, // '&>>' vs. REDIRECT_BOTH_APP, //">>&"
It would be nice to also account for escaped tokens. There are also examples in the evaluation cases. These could be identified as specific tokens or handled in expansion.
> (echo > just outputs '>')