Open stavros-k opened 1 year ago
Cc @MichaIng
Not related to our recent change, so I removed 26-feedback tag.
Same request was done here: https://github.com/nextcloud/docker/issues/1218 Since it makes more sense to discuss it here on the server repo, I'm closing the issue at the Docker repo and copying the relevant part of my comment here:
All the other header values you added are obsolete if
noindex
is set. So they are not "doing more" but should be removed to save the additional bytes transferred and to avoid possible confusion of other search engines which do not understand them.
So while the extras do most likely not hurt, they are all redundant/unnecessary and at best ballast. And while I do not think so, there is also a chance that those, since not supported by all search engines, may even confuse other search engines and in worst case lead to the essential noindex
being ignored. So IMO it is correct to explicitly accept only noindex, nofollow
, as failsafe value.
modify the X-Robots-Tag in nextcloud\config\nginx\site-confs
"add_header X-Robots-Tag "noindex, nofollow" always;"
It should be possible to eliminate this warning.
Shouldn't none
be equivalent to noindex, nofollow
and not show the warning ?
It is not equivalent: #36689
So IMO it is correct to explicitly accept only
noindex, nofollow
, as failsafe value
I disagree. While this might be true for the values noted in the OP I am using the additional value noai
, which is not covered by noindex
or nofollow
. It should only be checked if the wanted values are set, not if they are set exclusively.
Is there some "official" documentation about this header value? I could only find it being respected by the img2dataset tool, which used to massively cause trouble for image-related websites like DeviantArt, not to forget the personal and legal concerns: https://github.com/rom1504/img2dataset/pull/249/files
However, checking the docs of the option which is not set by default to 4 header values, suggests that any of them is interpreted as denial: https://github.com/rom1504/img2dataset/blob/6adc82b/README.md?plain=1#L169
So for this tool at least, noindex
implies noai
(as long as users of this tool do not override/empty the options default).
If you know a tool/service/case where it is different, and noai
is additionally required, let me know, and I will adjust the check when I find time.
This is still not fixed
It is also still no closed and the confirmation that any additional value for this header makes any sense is outstanding (see my last reply). As long as noindex, nofollow
covers everything else, there is no point to allow extending the value.
Yes, but e.g. nofollow, noindex
should also be valid as it has the same meaning. Currently it triggers the warning.
The test should look for the presence of noindex
and nofollow
instead of looking fornoindex, nofollow
.
Simple as.
Of course we could make those tests overly complex and hence error-prone. But is it really such a problem to just exchange the two arguments in your webserver config?
I don't consider a test that tests correctly overly complex
I consider a test that doesn't test correctly a bug
If a project doesn't aim to be bug free, it can never become bug free
I am just a spare time contributor as well, and I have no motivation to rewrite the test unless it is really required to solve peoples need. All headers tests expect a hardcoded string, this was always the case, and it should not be a problem unless this hardcoded string is not fulfilling all possible user needs. As noindex, nofollow
and nofollow, noindex
are equivalent, the second does not add any use case and hence does not solve any problem, so I am not gonna invest time for this.
If you have a different opinion, and changing the header in your webserver config causes you more trouble then rewriting the test yourself, you are of course free to open a PR 😉.
I have X-Robots-Tag set to noindex, nofollow yet I still get error messages about it not being set to noindex,nofollow
might be best to remove the error message altogether as I have never known it to work correctly.
The entire approach is flawed, wondering how low quality code like this can stay in such a big project.
@kale1d0code Then you did not apply it correctly. Please check this:
curl -IL https://localhost/
or of your Nextcloud is in a sub directory:
curl -IL https://localhost/nextcloud/
@darkBuddha
You are free to open a PR to enhance the code, if you think you can do it better and it is worth your time. I am a volunteer with own projects, and others I am contributing to. All I did was changing the hardcoded expected value none
into the hardcoded expected value noindex, nofollow
, since the first is not supported by all search engines. This was a quick change allowing you to have your Nextcloud instance removed from all search engines without triggering the admin panel warning, so this was a real enhancement and worth my time. Based on all information I read here and elsewhere, changing this into a complicated conditional to support all variances of this header value, with(out) spaces, additional values, swapped values etc, does not add anything to hiding Nextcloud instance content from search engines. All it does is, allowing admins to copy and paste other values into their webservers, which are not better than noindex, nofollow
, but probably worse (if we do not further research about conflicting values, code the check very carefully etc), and this for a significantly larger amount of invested time, either someones free time, or paid developer time, which is surely better invested in real enhancements, at least IMO.
If this topic remains just a "but I want to be able to copy&paste other/worse values into my config, while I am not able or willing to implement this myself", I vote for locking it. If someone comes up with a "real" argument, then we can reopen it.
I ran
curl -IL http://localhost:8080/
and found the x-robots-tag in every 302 and 303 request but the last 200 request did not have the header. This is confusing because nginx should always add the header to every single request due to my config
server {
listen 8080;
listen [::]:8080;
server_name ${NGINX_HOST};
port_in_redirect off;
fastcgi_param SERVER_PORT 443;
proxy_set_header Host $http_host;
fastcgi_param HTTPS on;
# HTTP response headers borrowed from Nextcloud `.htaccess`
add_header Referrer-Policy "no-referrer" always;
add_header X-Content-Type-Options "nosniff" always;
add_header X-Download-Options "noopen" always;
add_header X-Frame-Options "SAMEORIGIN" always;
add_header X-Permitted-Cross-Domain-Policies "none" always;
add_header X-Robots-Tag "noindex, nofollow" always;
add_header X-XSS-Protection "1; mode=block" always;
add_header Strict-Transport-Security "max-age=31536000; includeSubDomains" always;
fastcgi_hide_header X-Powered-By;
root /var/www/html;
...
another confusing thing is the header shows a space in the curl output but doesn't in chrome devtools.
Note that in Nginx, add_header
directives or parent config blocks have no effect if applied child config blocks contain any add_header
by themselves. So if there is any other location
block or similar applied for the last request, this might explain it.
⚠️ This issue respects the following points: ⚠️
Bug description
I have set the following tags using treafik (which I can see in the response headers)
Yet nextcloud still shows
Steps to reproduce
X-Robot-Tag
plus some extra.Overview
Expected behavior
Warning to only show when headers are lesser than the recommended
Installation method
Community Docker image
Operating system
Other
PHP engine version
PHP 8.1
Web server
Nginx
Database engine version
PostgreSQL
Is this bug present after an update or on a fresh install?
Updated to a major version (ex. 22.2.3 to 23.0.1)
Are you using the Nextcloud Server Encryption module?
Encryption is Disabled
What user-backends are you using?
Configuration report
List of activated Apps
Nextcloud Signing status
Nextcloud Logs
No response
Additional info
No response