u413-284-si / webserv

This project is about setting up a http web server, providing a static website.
MIT License
0 stars 0 forks source link

9 config parsing check if config file is valid #24

Closed QCHR1581 closed 2 months ago

QCHR1581 commented 3 months ago

Implemented classes and structs:

Functionality of the class ConfigFileParser

The class does following basic checks to make sure the config file is valid:

The class reads successfully following directives:

The class does the following specific tests for the successfully read directives:

Tests

Implemented the following tests for an invalid file:

  1. File could not be opened (because the path is wrong, such a file does not exist etc.)
  2. File is empty
  3. File has open brackets (including missing brackets and too many brackets)
  4. File is missing the http block
  5. File is missing server block
  6. File contains invalid directive (including server and location)
  7. File contains missing semicolon
  8. Listen directive contains invalid ip address
  9. Listen directive contains invalid port
  10. Root directive has no root path
  11. Root directive has multiple root paths

Implemented the following tests for a valid file:

  1. A standard valid file
  2. Listen directive contains only ip
  3. Listen directive contains only port
  4. Listen contains ip and port
gwolf-011235 commented 3 months ago

Would stick with line-by-line reading. It is a reasonable approach which allows for more structured code.

Chris and I talked the last time we met and we identified three use cases, where line-by-line reading may be suboptimal since you only have context of the current line.

  1. Server scope directive after location directive
Server {
  Location / {
    index index.html;
  }
  Location /images/ {
    root /home;
    index index.html;
  }
  root /var;
}

In this case the first block should receive the server block directive, and the second should retain its custom root. With line by line reading you would set it to default in the first and custom in second. Later you find it the server directive so you would revisit the location blocks. But then you need to know if it has default value and if so overwrite. You also need to know if the default wasn't set explicitly, in which case you also dont overwrite. If you read in the whole file you can easily make two passthroughs: iterate over the server block, identify all server scope directives. Iterate another time, this time check for location blocks.

  1. Newline as whitespace
server {
  server_name example.com
                       sample.com
                       illustration.com;
}

Nginx doesn't care about newline except after a comment. This means a directive can span several lines. We don't have to implement this

  1. Open brackets

For the open brackets check we need to read the file twice. We don't have to care about this.

We could ignore 2 and 3 but I really think 1 is easier if you have the context of the whole file.

u413-284-si commented 3 months ago

Would stick with line-by-line reading. It is a reasonable approach which allows for more structured code.

Chris and I talked the last time we met and we identified three use cases, where line-by-line reading may be suboptimal since you only have context of the current line.

1. Server scope directive after location directive
Server {
  Location / {
    index index.html;
  }
  Location /images/ {
    root /home;
    index index.html;
  }
  root /var;
}

In this case the first block should receive the server block directive, and the second should retain its custom root. With line by line reading you would set it to default in the first and custom in second. Later you find it the server directive so you would revisit the location blocks. But then you need to know if it has default value and if so overwrite. You also need to know if the default wasn't set explicitly, in which case you also dont overwrite. If you read in the whole file you can easily make two passthroughs: iterate over the server block, identify all server scope directives. Iterate another time, this time check for location blocks.

2. Newline as whitespace
server {
  server_name example.com
                       sample.com
                       illustration.com;
}

Nginx doesn't care about newline except after a comment. This means a directive can span several lines. We don't have to implement this

3. Open brackets

For the open brackets check we need to read the file twice. We don't have to care about this.

We could ignore 2 and 3 but I really think 1 is easier if you have the context of the whole file.

The decision has then been made to rewrite the parsing in a way that the file is checked in a manner that first the server and its directives are determined and then the locations within the server, and then we move on to the next server? Server 1 -> all directives -> all locations with directives Server 2 -> all directives -> all locations with directives ...

regarding 2: How do we know we don't have to care about this?

gwolf-011235 commented 3 months ago

The decision has then been made to rewrite the parsing in a way that the file is checked in a manner that first the server and its directives are determined and then the locations within the server, and then we move on to the next server?

Yes. I think this is something somebody could easily test and break. You'd just need to write the server directive under a location block.

But you can still do this with line-by-line reading. I think bulk reading makes it easier; Chris wanted to think about it.

regarding 2: How do we know we don't have to care about this?

As with all 42 evaluations: we don't. If a nginx veteran evaluates us and says it has to be exactly the same, this could fail us. If we can convince that this is expected behavior, then yay.

Since the subject is not explicit (it only talks about nginx as comparison for header and answer behaviors), we could argue how our config file works is for us to decide. So we could also say if you put a server directive under a location block that it will be ignored. And if you put a newline it means new directive.

I think implicitly we wanted to closely follow the style of nginx parsing. But we can also decide otherwise?

QCHR1581 commented 3 months ago

Would stick with line-by-line reading. It is a reasonable approach which allows for more structured code.

Chris and I talked the last time we met and we identified three use cases, where line-by-line reading may be suboptimal since you only have context of the current line.

  1. Server scope directive after location directive
Server {
  Location / {
    index index.html;
  }
  Location /images/ {
    root /home;
    index index.html;
  }
  root /var;
}

In this case the first block should receive the server block directive, and the second should retain its custom root. With line by line reading you would set it to default in the first and custom in second. Later you find it the server directive so you would revisit the location blocks. But then you need to know if it has default value and if so overwrite. You also need to know if the default wasn't set explicitly, in which case you also dont overwrite. If you read in the whole file you can easily make two passthroughs: iterate over the server block, identify all server scope directives. Iterate another time, this time check for location blocks.

I think that reading several times through the content of the config file is possible in both cases:

So it is more a question regarding efficiency. In this case probably the approach with reading the file once and then storing it into a var is better.

  1. Newline as whitespace
server {
  server_name example.com
                       sample.com
                       illustration.com;
}

Nginx doesn't care about newline except after a comment. This means a directive can span several lines. We don't have to implement this

I also tested other cases like this:

 server
        {
                server_name example.com
                root /var/html
                example3.com;
                location /secret 
                {
                        autoindex on;
                }
        }

This produces the following message:

nginx: [warn] server name "/var/html" has suspicious symbols in /etc/nginx/nginx.conf:17
nginx: the configuration file /etc/nginx/nginx.conf syntax is ok
nginx: configuration file /etc/nginx/nginx.conf test is successful

Which tells us that this is still valid, even though a valid directive is used as a server name and that it also checks if it contains "supicious symbols".

In my opinion handling this behaviour of nginx would be overkill. Therefore I would not implement this.

gwolf-011235 commented 3 months ago

I think it doesn't even parse it as directive, just as value?

Key = server_name

Value = [ example.com, root, /var/html, example3.com]

And suspicious in that case would be '/' in a server name (which may cause problems?)

QCHR1581 commented 3 months ago

I think it doesn't even parse it as directive, just as value?

Key = server_name

Value = [ example.com, root, /var/html, example3.com]

And suspicious in that case would be '/' in a server name (which may cause problems?)

Yes, I also assume that. I think that this behaviour just makes it more trickier to check if a line has a semicolon or not.

In this case it's valid that there is a directive with a value in a line that does not contain a semicolon, because both (directive and value) get interpreted as values for the server_name directive.

I thought about storing the whole content of the file in a string and then reading the string with getline. So there is no need to read the file again and again but using the string for this purpose and without the need to completely rewrite everything.

But for this behaviour it would be probably better to work with chars. Then we limit a line by looking for a semicolon. This is also possible but requires a complete new way to parse everything.

It is also possible that I just don't see a better solution even though there is a better way to do this.