QCHR1581 commented 3 months ago

Implemented classes and structs:

class ConfigFileParser:

Implements the parsing of the config file (including checks)
struct ConfigFile:

Stores the configuration for each server in a vector of ConfigSever structs. Also contains vars for reading the current line and storing the current vector index. Furthermore it stores the valid server directives.
struct ConfigServer:

Stores the configuration for each location in a vector of Location structs. Also contains vars for storing server name, root path, listening IPs and ports, maxbody size, error pages and the current vector index. Furthermore it stores the valid location directives.

Functionality of the class ConfigFileParser

The class does following basic checks to make sure the config file is valid:

Can be opened
Is not empty
Does not contain open brackets
Starts with 'http {'
Contains minimum one server configuration
Contains valid directives
No missing semicolons

The class reads successfully following directives:

listen
root

The class does the following specific tests for the successfully read directives:

listen:
- IP is valid (contains valid chars, octect not bigger then 255 and not smaller than 1)
- Port is valid (contains valid chars, value not bigger than 65535 and not smaller than 1)
It is also possible to only provide an IP or only a port
root:
- Contains a root path
- Contains only one root path
If there is a slash at the end of the root path, the slash gets removed and the path gets stored.

Tests

Implemented the following tests for an invalid file:

File could not be opened (because the path is wrong, such a file does not exist etc.)
File is empty
File has open brackets (including missing brackets and too many brackets)
File is missing the http block
File is missing server block
File contains invalid directive (including server and location)
File contains missing semicolon
Listen directive contains invalid ip address
Listen directive contains invalid port
Root directive has no root path
Root directive has multiple root paths

Implemented the following tests for a valid file:

A standard valid file
Listen directive contains only ip
Listen directive contains only port
Listen contains ip and port

gwolf-011235 commented 3 months ago

Would stick with line-by-line reading. It is a reasonable approach which allows for more structured code.

Chris and I talked the last time we met and we identified three use cases, where line-by-line reading may be suboptimal since you only have context of the current line.

Server scope directive after location directive

Server {
  Location / {
    index index.html;
  }
  Location /images/ {
    root /home;
    index index.html;
  }
  root /var;
}

In this case the first block should receive the server block directive, and the second should retain its custom root. With line by line reading you would set it to default in the first and custom in second. Later you find it the server directive so you would revisit the location blocks. But then you need to know if it has default value and if so overwrite. You also need to know if the default wasn't set explicitly, in which case you also dont overwrite. If you read in the whole file you can easily make two passthroughs: iterate over the server block, identify all server scope directives. Iterate another time, this time check for location blocks.

Newline as whitespace

server {
  server_name example.com
                       sample.com
                       illustration.com;
}

Nginx doesn't care about newline except after a comment. This means a directive can span several lines. We don't have to implement this

Open brackets

For the open brackets check we need to read the file twice. We don't have to care about this.

We could ignore 2 and 3 but I really think 1 is easier if you have the context of the whole file.

u413-284-si commented 3 months ago

Would stick with line-by-line reading. It is a reasonable approach which allows for more structured code.

Chris and I talked the last time we met and we identified three use cases, where line-by-line reading may be suboptimal since you only have context of the current line.
1. Server scope directive after location directive
Server {
  Location / {
    index index.html;
  }
  Location /images/ {
    root /home;
    index index.html;
  }
  root /var;
}
In this case the first block should receive the server block directive, and the second should retain its custom root. With line by line reading you would set it to default in the first and custom in second. Later you find it the server directive so you would revisit the location blocks. But then you need to know if it has default value and if so overwrite. You also need to know if the default wasn't set explicitly, in which case you also dont overwrite. If you read in the whole file you can easily make two passthroughs: iterate over the server block, identify all server scope directives. Iterate another time, this time check for location blocks.
2. Newline as whitespace
server {
  server_name example.com
                       sample.com
                       illustration.com;
}
Nginx doesn't care about newline except after a comment. This means a directive can span several lines. We don't have to implement this
3. Open brackets
For the open brackets check we need to read the file twice. We don't have to care about this.

We could ignore 2 and 3 but I really think 1 is easier if you have the context of the whole file.

The decision has then been made to rewrite the parsing in a way that the file is checked in a manner that first the server and its directives are determined and then the locations within the server, and then we move on to the next server? Server 1 -> all directives -> all locations with directives Server 2 -> all directives -> all locations with directives ...

regarding 2: How do we know we don't have to care about this?

gwolf-011235 commented 3 months ago

The decision has then been made to rewrite the parsing in a way that the file is checked in a manner that first the server and its directives are determined and then the locations within the server, and then we move on to the next server?

Yes. I think this is something somebody could easily test and break. You'd just need to write the server directive under a location block.

But you can still do this with line-by-line reading. I think bulk reading makes it easier; Chris wanted to think about it.

regarding 2: How do we know we don't have to care about this?

As with all 42 evaluations: we don't. If a nginx veteran evaluates us and says it has to be exactly the same, this could fail us. If we can convince that this is expected behavior, then yay.

Since the subject is not explicit (it only talks about nginx as comparison for header and answer behaviors), we could argue how our config file works is for us to decide. So we could also say if you put a server directive under a location block that it will be ignored. And if you put a newline it means new directive.

I think implicitly we wanted to closely follow the style of nginx parsing. But we can also decide otherwise?

QCHR1581 commented 3 months ago

Would stick with line-by-line reading. It is a reasonable approach which allows for more structured code.

Chris and I talked the last time we met and we identified three use cases, where line-by-line reading may be suboptimal since you only have context of the current line.

Server scope directive after location directive
Server {
  Location / {
    index index.html;
  }
  Location /images/ {
    root /home;
    index index.html;
  }
  root /var;
}
In this case the first block should receive the server block directive, and the second should retain its custom root. With line by line reading you would set it to default in the first and custom in second. Later you find it the server directive so you would revisit the location blocks. But then you need to know if it has default value and if so overwrite. You also need to know if the default wasn't set explicitly, in which case you also dont overwrite. If you read in the whole file you can easily make two passthroughs: iterate over the server block, identify all server scope directives. Iterate another time, this time check for location blocks.

I think that reading several times through the content of the config file is possible in both cases:

store the whole content in a variable (eg. string or vector of strings)
reading the file itself (and changing the cursor the beginning of the file after reading the last line)

So it is more a question regarding efficiency. In this case probably the approach with reading the file once and then storing it into a var is better.

Newline as whitespace
server {
  server_name example.com
                       sample.com
                       illustration.com;
}
Nginx doesn't care about newline except after a comment. This means a directive can span several lines. We don't have to implement this

I also tested other cases like this:

 server
        {
                server_name example.com
                root /var/html
                example3.com;
                location /secret 
                {
                        autoindex on;
                }
        }

This produces the following message:

nginx: [warn] server name "/var/html" has suspicious symbols in /etc/nginx/nginx.conf:17
nginx: the configuration file /etc/nginx/nginx.conf syntax is ok
nginx: configuration file /etc/nginx/nginx.conf test is successful

Which tells us that this is still valid, even though a valid directive is used as a server name and that it also checks if it contains "supicious symbols".

In my opinion handling this behaviour of nginx would be overkill. Therefore I would not implement this.

gwolf-011235 commented 3 months ago

I think it doesn't even parse it as directive, just as value?

Key = server_name

Value = [ example.com, root, /var/html, example3.com]

And suspicious in that case would be '/' in a server name (which may cause problems?)

QCHR1581 commented 3 months ago

I think it doesn't even parse it as directive, just as value?

Key = server_name

Value = [ example.com, root, /var/html, example3.com]

And suspicious in that case would be '/' in a server name (which may cause problems?)

Yes, I also assume that. I think that this behaviour just makes it more trickier to check if a line has a semicolon or not.

In this case it's valid that there is a directive with a value in a line that does not contain a semicolon, because both (directive and value) get interpreted as values for the server_name directive.

I thought about storing the whole content of the file in a string and then reading the string with getline. So there is no need to read the file again and again but using the string for this purpose and without the need to completely rewrite everything.

But for this behaviour it would be probably better to work with chars. Then we limit a line by looking for a semicolon. This is also possible but requires a complete new way to parse everything.

It is also possible that I just don't see a better solution even though there is a better way to do this.

u413-284-si / webserv

9 config parsing check if config file is valid #24

Implemented classes and structs:

class ConfigFileParser:

struct ConfigFile:

struct ConfigServer:

Functionality of the class ConfigFileParser

The class does following basic checks to make sure the config file is valid:

The class reads successfully following directives:

The class does the following specific tests for the successfully read directives:

Tests

Implemented the following tests for an invalid file:

Implemented the following tests for a valid file: