nextcloud / server

☁️ Nextcloud server, a safe home for all your data
https://nextcloud.com
GNU Affero General Public License v3.0
27.39k stars 4.07k forks source link

[Bug]: Setup check for `X-Robot-Tag` expects exact match, resulting in a warning #37409

Open stavros-k opened 1 year ago

stavros-k commented 1 year ago

⚠️ This issue respects the following points: ⚠️

Bug description

I have set the following tags using treafik (which I can see in the response headers)

x-robots-tag: none,noarchive,nosnippet,notranslate,noimageindex,nofollow,noindex

Yet nextcloud still shows

The "X-Robots-Tag" HTTP header is not set to "noindex, nofollow". This is a potential security or privacy risk, as it is recommended to adjust this setting accordingly.

Steps to reproduce

  1. Set required X-Robot-Tag plus some extra.
  2. Go to Overview
  3. See the warning

Expected behavior

Warning to only show when headers are lesser than the recommended

Installation method

Community Docker image

Operating system

Other

PHP engine version

PHP 8.1

Web server

Nginx

Database engine version

PostgreSQL

Is this bug present after an update or on a fresh install?

Updated to a major version (ex. 22.2.3 to 23.0.1)

Are you using the Nextcloud Server Encryption module?

Encryption is Disabled

What user-backends are you using?

Configuration report

root@a0e913b615b7:/# occ config:list system
{
    "system": {
        "memcache.local": "\\OC\\Memcache\\APCu",
        "memcache.distributed": "\\OC\\Memcache\\Redis",
        "memcache.locking": "\\OC\\Memcache\\Redis",
        "redis": {
            "host": "***REMOVED SENSITIVE VALUE***",
            "port": 6379,
            "password": "***REMOVED SENSITIVE VALUE***"
        },
        "datadirectory": "***REMOVED SENSITIVE VALUE***",
        "instanceid": "***REMOVED SENSITIVE VALUE***",
        "passwordsalt": "***REMOVED SENSITIVE VALUE***",
        "secret": "***REMOVED SENSITIVE VALUE***",
        "trusted_domains": [
            "10.10.10.254",
            "cloud.DOMAIN.TLD"
        ],
        "trusted_proxies": "***REMOVED SENSITIVE VALUE***",
        "overwrite.cli.url": "https:\/\/cloud.DOMAIN.TLD\/",
        "overwriteprotocol": "https",
        "overwritehost": "cloud.DOMAIN.TLD",
        "dbtype": "pgsql",
        "version": "26.0.0.11",
        "dbname": "***REMOVED SENSITIVE VALUE***",
        "dbhost": "***REMOVED SENSITIVE VALUE***",
        "dbport": "",
        "dbtableprefix": "oc_",
        "mysql.utf8mb4": true,
        "dbuser": "***REMOVED SENSITIVE VALUE***",
        "dbpassword": "***REMOVED SENSITIVE VALUE***",
        "installed": true,
        "updater.release.channel": "stable",
        "maintenance": false,
        "theme": "",
        "loglevel": 2,
        "logfile": "\/config\/log\/nextcloud\/nextcloud.log",
        "logdateformat": "D d\/m\/Y H:i:s",
        "log.condition": {
            "apps": [
                "admin_audit"
            ]
        },
        "data-fingerprint": "6619a1ae23dec248f717298ba30bd687",
        "mail_smtpmode": "smtp",
        "mail_smtpauthtype": "LOGIN",
        "mail_sendmailmode": "pipe",
        "mail_smtpauth": 1,
        "mail_smtpsecure": "tls",
        "mail_from_address": "***REMOVED SENSITIVE VALUE***",
        "mail_domain": "***REMOVED SENSITIVE VALUE***",
        "mail_smtphost": "***REMOVED SENSITIVE VALUE***",
        "mail_smtpport": "587",
        "mail_smtpname": "***REMOVED SENSITIVE VALUE***",
        "mail_smtppassword": "***REMOVED SENSITIVE VALUE***",
        "trashbin_retention_obligation": "30,60",
        "versions_retention_obligation": "30,60",
        "enable_previews": true,
        "preview_max_x": "2048",
        "preview_max_y": "2048",
        "preview_max_memory": "1024",
        "jpeg_quality": "60",
        "default_phone_region": "GR",
        "activity_expire_days": "90",
        "app_install_overwrite": [
            "previewgenerator",
            "breezedark",
            "bookmarks",
            "bruteforcesettings",
            "imageconverter",
            "files_antivirus",
            "news",
            "timemanager"
        ],
        "enabledPreviewProviders": [
            "OC\\Preview\\Image",
            "OC\\Preview\\MP3",
            "OC\\Preview\\TXT",
            "OC\\Preview\\PDF",
            "OC\\Preview\\Movie",
            "OC\\Preview\\Photoshop",
            "OC\\Preview\\TIFF",
            "OC\\Preview\\SVG",
            "OC\\Preview\\OpenDocument",
            "OC\\Preview\\HEIC"
        ]
    }
}

List of activated Apps

Enabled:
  - activity: 2.18.0
  - admin_audit: 1.16.0
  - bookmarks: 13.0.1
  - breezedark: 25.0.1
  - bruteforcesettings: 2.6.0
  - calendar: 4.3.1
  - circles: 26.0.0
  - cloud_federation_api: 1.9.0
  - comments: 1.16.0
  - contacts: 5.2.0
  - contactsinteraction: 1.7.0
  - dashboard: 7.6.0
  - dav: 1.25.0
  - deck: 1.9.0
  - federatedfilesharing: 1.16.0
  - federation: 1.16.0
  - files: 1.21.1
  - files_antivirus: 4.0.3
  - files_automatedtagging: 1.16.1
  - files_pdfviewer: 2.7.0
  - files_rightclick: 1.5.0
  - files_sharing: 1.18.0
  - files_trashbin: 1.16.0
  - files_versions: 1.19.1
  - firstrunwizard: 2.15.0
  - imageconverter: 1.3.4
  - logreader: 2.11.0
  - lookup_server_connector: 1.14.0
  - mail: 3.0.2
  - news: 21.1.0
  - nextcloud_announcements: 1.15.0
  - notes: 4.7.2
  - notifications: 2.14.0
  - oauth2: 1.14.0
  - password_policy: 1.16.0
  - photos: 2.2.0
  - previewgenerator: 5.2.1
  - privacy: 1.10.0
  - provisioning_api: 1.16.0
  - recognize: 3.7.0
  - recommendations: 1.5.0
  - related_resources: 1.1.0-alpha1
  - richdocuments: 8.0.0
  - serverinfo: 1.16.0
  - settings: 1.8.0
  - sharebymail: 1.16.0
  - support: 1.9.0
  - survey_client: 1.14.0
  - suspicious_login: 4.4.0
  - systemtags: 1.16.0
  - tasks: 0.14.5
  - text: 3.7.2
  - theming: 2.1.1
  - twofactor_backupcodes: 1.15.0
  - twofactor_totp: 8.0.0-alpha.0
  - updatenotification: 1.16.0
  - user_status: 1.6.0
  - viewer: 1.10.0
  - weather_status: 1.6.0
  - workflowengine: 2.8.0
Disabled:
  - encryption: 2.14.0 (installed 2.8.1)
  - files_external: 1.18.0 (installed 1.17.0)
  - onlyoffice: 7.8.0 (installed 7.8.0)
  - user_ldap: 1.16.0

Nextcloud Signing status

No errors have been found.

Nextcloud Logs

No response

Additional info

No response

szaimen commented 1 year ago

Cc @MichaIng

MichaIng commented 1 year ago

Not related to our recent change, so I removed 26-feedback tag.

Same request was done here: https://github.com/nextcloud/docker/issues/1218 Since it makes more sense to discuss it here on the server repo, I'm closing the issue at the Docker repo and copying the relevant part of my comment here:

All the other header values you added are obsolete if noindex is set. So they are not "doing more" but should be removed to save the additional bytes transferred and to avoid possible confusion of other search engines which do not understand them.

So while the extras do most likely not hurt, they are all redundant/unnecessary and at best ballast. And while I do not think so, there is also a chance that those, since not supported by all search engines, may even confuse other search engines and in worst case lead to the essential noindex being ignored. So IMO it is correct to explicitly accept only noindex, nofollow, as failsafe value.

smileyxy commented 1 year ago

modify the X-Robots-Tag in nextcloud\config\nginx\site-confs "add_header X-Robots-Tag "noindex, nofollow" always;" It should be possible to eliminate this warning.

hrenard commented 1 year ago

Shouldn't none be equivalent to noindex, nofollow and not show the warning ?

MichaIng commented 1 year ago

It is not equivalent: #36689

schneidr commented 1 year ago

So IMO it is correct to explicitly accept only noindex, nofollow, as failsafe value

I disagree. While this might be true for the values noted in the OP I am using the additional value noai, which is not covered by noindex or nofollow. It should only be checked if the wanted values are set, not if they are set exclusively.

MichaIng commented 1 year ago

Is there some "official" documentation about this header value? I could only find it being respected by the img2dataset tool, which used to massively cause trouble for image-related websites like DeviantArt, not to forget the personal and legal concerns: https://github.com/rom1504/img2dataset/pull/249/files

However, checking the docs of the option which is not set by default to 4 header values, suggests that any of them is interpreted as denial: https://github.com/rom1504/img2dataset/blob/6adc82b/README.md?plain=1#L169 So for this tool at least, noindex implies noai (as long as users of this tool do not override/empty the options default).

If you know a tool/service/case where it is different, and noai is additionally required, let me know, and I will adjust the check when I find time.

darkBuddha commented 1 year ago

This is still not fixed

MichaIng commented 1 year ago

It is also still no closed and the confirmation that any additional value for this header makes any sense is outstanding (see my last reply). As long as noindex, nofollow covers everything else, there is no point to allow extending the value.

darkBuddha commented 1 year ago

Yes, but e.g. nofollow, noindex should also be valid as it has the same meaning. Currently it triggers the warning.

The test should look for the presence of noindex and nofollow instead of looking fornoindex, nofollow.

Simple as.

MichaIng commented 1 year ago

Of course we could make those tests overly complex and hence error-prone. But is it really such a problem to just exchange the two arguments in your webserver config?

darkBuddha commented 1 year ago

I don't consider a test that tests correctly overly complex

I consider a test that doesn't test correctly a bug

If a project doesn't aim to be bug free, it can never become bug free

MichaIng commented 1 year ago

I am just a spare time contributor as well, and I have no motivation to rewrite the test unless it is really required to solve peoples need. All headers tests expect a hardcoded string, this was always the case, and it should not be a problem unless this hardcoded string is not fulfilling all possible user needs. As noindex, nofollow and nofollow, noindex are equivalent, the second does not add any use case and hence does not solve any problem, so I am not gonna invest time for this.

If you have a different opinion, and changing the header in your webserver config causes you more trouble then rewriting the test yourself, you are of course free to open a PR 😉.

kale1d0code commented 3 months ago

I have X-Robots-Tag set to noindex, nofollow yet I still get error messages about it not being set to noindex,nofollow

might be best to remove the error message altogether as I have never known it to work correctly.

darkBuddha commented 3 months ago

The entire approach is flawed, wondering how low quality code like this can stay in such a big project.

MichaIng commented 3 months ago

@kale1d0code Then you did not apply it correctly. Please check this:

curl -IL https://localhost/

or of your Nextcloud is in a sub directory:

curl -IL https://localhost/nextcloud/

@darkBuddha You are free to open a PR to enhance the code, if you think you can do it better and it is worth your time. I am a volunteer with own projects, and others I am contributing to. All I did was changing the hardcoded expected value none into the hardcoded expected value noindex, nofollow, since the first is not supported by all search engines. This was a quick change allowing you to have your Nextcloud instance removed from all search engines without triggering the admin panel warning, so this was a real enhancement and worth my time. Based on all information I read here and elsewhere, changing this into a complicated conditional to support all variances of this header value, with(out) spaces, additional values, swapped values etc, does not add anything to hiding Nextcloud instance content from search engines. All it does is, allowing admins to copy and paste other values into their webservers, which are not better than noindex, nofollow, but probably worse (if we do not further research about conflicting values, code the check very carefully etc), and this for a significantly larger amount of invested time, either someones free time, or paid developer time, which is surely better invested in real enhancements, at least IMO.

If this topic remains just a "but I want to be able to copy&paste other/worse values into my config, while I am not able or willing to implement this myself", I vote for locking it. If someone comes up with a "real" argument, then we can reopen it.

kale1d0code commented 3 months ago

I ran

curl -IL http://localhost:8080/

and found the x-robots-tag in every 302 and 303 request but the last 200 request did not have the header. This is confusing because nginx should always add the header to every single request due to my config

server {
    listen 8080;
    listen [::]:8080;
    server_name ${NGINX_HOST};

    port_in_redirect off;
    fastcgi_param  SERVER_PORT 443;
    proxy_set_header Host $http_host;
    fastcgi_param HTTPS on;

    # HTTP response headers borrowed from Nextcloud `.htaccess`
    add_header Referrer-Policy                      "no-referrer"              always;
    add_header X-Content-Type-Options               "nosniff"                  always;
    add_header X-Download-Options                   "noopen"                   always;
    add_header X-Frame-Options                      "SAMEORIGIN"               always;
    add_header X-Permitted-Cross-Domain-Policies    "none"                     always;
    add_header X-Robots-Tag                         "noindex, nofollow"        always;
    add_header X-XSS-Protection                     "1; mode=block"            always;
    add_header Strict-Transport-Security "max-age=31536000; includeSubDomains" always;
    fastcgi_hide_header X-Powered-By;

    root /var/www/html;
    ...

another confusing thing is the header shows a space in the curl output but doesn't in chrome devtools.

MichaIng commented 3 months ago

Note that in Nginx, add_header directives or parent config blocks have no effect if applied child config blocks contain any add_header by themselves. So if there is any other location block or similar applied for the last request, this might explain it.