Closed QCHR1581 closed 2 months ago
Would stick with line-by-line reading. It is a reasonable approach which allows for more structured code.
Chris and I talked the last time we met and we identified three use cases, where line-by-line reading may be suboptimal since you only have context of the current line.
Server {
Location / {
index index.html;
}
Location /images/ {
root /home;
index index.html;
}
root /var;
}
In this case the first block should receive the server block directive, and the second should retain its custom root. With line by line reading you would set it to default in the first and custom in second. Later you find it the server directive so you would revisit the location blocks. But then you need to know if it has default value and if so overwrite. You also need to know if the default wasn't set explicitly, in which case you also dont overwrite. If you read in the whole file you can easily make two passthroughs: iterate over the server block, identify all server scope directives. Iterate another time, this time check for location blocks.
server {
server_name example.com
sample.com
illustration.com;
}
Nginx doesn't care about newline except after a comment. This means a directive can span several lines. We don't have to implement this
For the open brackets check we need to read the file twice. We don't have to care about this.
We could ignore 2 and 3 but I really think 1 is easier if you have the context of the whole file.
Would stick with line-by-line reading. It is a reasonable approach which allows for more structured code.
Chris and I talked the last time we met and we identified three use cases, where line-by-line reading may be suboptimal since you only have context of the current line.
1. Server scope directive after location directive
Server { Location / { index index.html; } Location /images/ { root /home; index index.html; } root /var; }
In this case the first block should receive the server block directive, and the second should retain its custom root. With line by line reading you would set it to default in the first and custom in second. Later you find it the server directive so you would revisit the location blocks. But then you need to know if it has default value and if so overwrite. You also need to know if the default wasn't set explicitly, in which case you also dont overwrite. If you read in the whole file you can easily make two passthroughs: iterate over the server block, identify all server scope directives. Iterate another time, this time check for location blocks.
2. Newline as whitespace
server { server_name example.com sample.com illustration.com; }
Nginx doesn't care about newline except after a comment. This means a directive can span several lines. We don't have to implement this
3. Open brackets
For the open brackets check we need to read the file twice. We don't have to care about this.
We could ignore 2 and 3 but I really think 1 is easier if you have the context of the whole file.
The decision has then been made to rewrite the parsing in a way that the file is checked in a manner that first the server and its directives are determined and then the locations within the server, and then we move on to the next server? Server 1 -> all directives -> all locations with directives Server 2 -> all directives -> all locations with directives ...
regarding 2: How do we know we don't have to care about this?
The decision has then been made to rewrite the parsing in a way that the file is checked in a manner that first the server and its directives are determined and then the locations within the server, and then we move on to the next server?
Yes. I think this is something somebody could easily test and break. You'd just need to write the server directive under a location block.
But you can still do this with line-by-line reading. I think bulk reading makes it easier; Chris wanted to think about it.
regarding 2: How do we know we don't have to care about this?
As with all 42 evaluations: we don't. If a nginx veteran evaluates us and says it has to be exactly the same, this could fail us. If we can convince that this is expected behavior, then yay.
Since the subject is not explicit (it only talks about nginx as comparison for header and answer behaviors), we could argue how our config file works is for us to decide. So we could also say if you put a server directive under a location block that it will be ignored. And if you put a newline it means new directive.
I think implicitly we wanted to closely follow the style of nginx parsing. But we can also decide otherwise?
Would stick with line-by-line reading. It is a reasonable approach which allows for more structured code.
Chris and I talked the last time we met and we identified three use cases, where line-by-line reading may be suboptimal since you only have context of the current line.
- Server scope directive after location directive
Server { Location / { index index.html; } Location /images/ { root /home; index index.html; } root /var; }
In this case the first block should receive the server block directive, and the second should retain its custom root. With line by line reading you would set it to default in the first and custom in second. Later you find it the server directive so you would revisit the location blocks. But then you need to know if it has default value and if so overwrite. You also need to know if the default wasn't set explicitly, in which case you also dont overwrite. If you read in the whole file you can easily make two passthroughs: iterate over the server block, identify all server scope directives. Iterate another time, this time check for location blocks.
I think that reading several times through the content of the config file is possible in both cases:
So it is more a question regarding efficiency. In this case probably the approach with reading the file once and then storing it into a var is better.
- Newline as whitespace
server { server_name example.com sample.com illustration.com; }
Nginx doesn't care about newline except after a comment. This means a directive can span several lines. We don't have to implement this
I also tested other cases like this:
server
{
server_name example.com
root /var/html
example3.com;
location /secret
{
autoindex on;
}
}
This produces the following message:
nginx: [warn] server name "/var/html" has suspicious symbols in /etc/nginx/nginx.conf:17
nginx: the configuration file /etc/nginx/nginx.conf syntax is ok
nginx: configuration file /etc/nginx/nginx.conf test is successful
Which tells us that this is still valid, even though a valid directive is used as a server name and that it also checks if it contains "supicious symbols".
In my opinion handling this behaviour of nginx would be overkill. Therefore I would not implement this.
I think it doesn't even parse it as directive, just as value?
Key = server_name
Value = [ example.com, root, /var/html, example3.com]
And suspicious in that case would be '/' in a server name (which may cause problems?)
I think it doesn't even parse it as directive, just as value?
Key = server_name
Value = [ example.com, root, /var/html, example3.com]
And suspicious in that case would be '/' in a server name (which may cause problems?)
Yes, I also assume that. I think that this behaviour just makes it more trickier to check if a line has a semicolon or not.
In this case it's valid that there is a directive with a value in a line that does not contain a semicolon, because both (directive and value) get interpreted as values for the server_name directive.
I thought about storing the whole content of the file in a string and then reading the string with getline. So there is no need to read the file again and again but using the string for this purpose and without the need to completely rewrite everything.
But for this behaviour it would be probably better to work with chars. Then we limit a line by looking for a semicolon. This is also possible but requires a complete new way to parse everything.
It is also possible that I just don't see a better solution even though there is a better way to do this.
Implemented classes and structs:
class ConfigFileParser:
Implements the parsing of the config file (including checks)
struct ConfigFile:
Stores the configuration for each server in a vector of ConfigSever structs. Also contains vars for reading the current line and storing the current vector index. Furthermore it stores the valid server directives.
struct ConfigServer:
Stores the configuration for each location in a vector of Location structs. Also contains vars for storing server name, root path, listening IPs and ports, maxbody size, error pages and the current vector index. Furthermore it stores the valid location directives.
Functionality of the class ConfigFileParser
The class does following basic checks to make sure the config file is valid:
The class reads successfully following directives:
The class does the following specific tests for the successfully read directives:
listen:
It is also possible to only provide an IP or only a port
root:
If there is a slash at the end of the root path, the slash gets removed and the path gets stored.
Tests
Implemented the following tests for an invalid file:
Implemented the following tests for a valid file: