huridocs / uwazi

Uwazi is a web-based, open-source solution for building and sharing document collections
http://www.uwazi.io
MIT License
237 stars 79 forks source link

Trying to access a root URL that doesn't exist tosses a 301 instead of a 404 #6846

Open txau opened 4 months ago

txau commented 4 months ago

When requesting a URL of the type https://www.girlsrightsplatform.org/SOMETHING , Uwazi is returning a 301 (moved permanently) instead of a 404.

RafaPolit commented 4 months ago

We just fixed, in theory, that 200s that were wrongly reported should return 404s. Now they are apparently returning 301s? Maybe it's because it's the first param and it tries to first match the language, then the actual page and something get's "lost" in the middle?

Could this be triggering the default language problem described in #6843?

cc @mfacar

mfacar commented 4 months ago

You're right @RafaPolit, the Path in react-router matches to /:lang, then we are showing the computed index Element for the root route. I couldn't get any other code than 200 even for the reported URL where I'm getting: Request URL: https://www.girlsrightsplatform.org/SOMETHING Request Method: GET Status Code: 200 OK

An alternative could be to validate the param to be a valid language and if not redirect to a 404 route.

It seems not to be related to #6843

RafaPolit commented 4 months ago

There should be two different validations: it needs to be EITHER a valid language, OR a valid URL endpoint.

For example, domain/en/entity should, indeed, validate that en is a valid language. But, domain/entity/adfs89798 should NOT attempt to validate that entity is a valid language, but a valid endpoint in our application, because we will use that to redirect to your cookie locale or the default language.

We want those links that don't include a language, but yeah, the above should not try to fetch the language "something" and then return the root path.

txau commented 4 months ago

Here is an example. As you can see it is registered as a 301 for the URL japiquay /wp-login.php

This could also be due to the load balancer redirecting when instances have custom domains (cc @vostorga )

In which case, the 301 would be correct and @mfacar is right and the routes are working correctly. If that is the case, let's close the issue.

{ "request": "GET /wp-login.php HTTP/2.0", "gl2_remote_port": 43552, "body_bytes_sent": 162, "connection_requests": 1, "source": "production-load-balancer", "request_method": "GET", "gl2_source_input": "627e2885c59648618bd0e788", "http_user_agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.124 Safari/537.36", "remote_addr_city_name": "N/A", "request_time": 0, "host": "japiqay.uwazi.io", "gl2_source_node": "d5014cba-367f-4e7c-af13-ee706c0985f1", "connection": "8644607", "pipe": ".", "gl2_accounted_message_size": 1434, "response_status": 301, "level": 6, "streams": [ "000000000000000000000001", "62875e11c59648618bdae3fa" ], "gl2_message_id": "01HZAPB9R01MYCW17X7JSRT2N3", "http_version": "HTTP/2.0", "message": "GET /wp-login.php HTTP/2.0", "nginx_access": true, "source_ip_reserved_ip": true, "request_length": 239, "facility_num": 23, "_id": "e6bf20e0-2051-11ef-9cd1-00160c474501", "facility": "local7", "nginx_json": "{ \"nginx_timestamp\": \"2024-06-01T20:02:40+00:00\", \"connection\": \"8644607\", \"connection_requests\": 1, \"pipe\": \".\", \"body_bytes_sent\": 162, \"request_length\": 239, \"request_time\": 0.000, \"response_status\": 301, \"request\": \"GET /wp-login.php HTTP/2.0\", \"request_method\": \"GET\", \"host\": \"japiqay.uwazi.io\", \"upstream_cache_status\": \"\", \"upstream_addr\": \"\", \"http_x_forwarded_for\": \"\", \"http_referrer\": \"\", \"http_user_agent\": \"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.124 Safari/537.36\", \"http_version\": \"HTTP/2.0\", \"remote_user\": \"\", \"http_x_forwarded_proto\": \"\", \"upstream_response_time\": \"\", \"nginx_access\": true }" }